OpenAI-compatible chat completions on dedicated GPU instances. Pick a model, view its pricing, and call it with your own API key.
OpenAI-compatible /v1/chat/completions:
curl https://api.tokenlabs.run/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "tokenlabsdotrun/Llama-3.1-8B-ModelOpt-NVFP4-QAT",
"messages": [
{"role": "user", "content": "Say hello from TokenLabs."}
]
}'
| Token type | Unit | Price (USD) |
|---|---|---|
| Input tokens (prompt) | per 1M | $0.0000 |
| Cached input tokens | per 1M | $0.0000 |
| Output tokens (completion) | per 1M | $0.0000 |
Prices shown here are per 1M tokens and are updated from CI benchmarks (e.g., vLLM throughput on DGX Spark at $0.26/hr).
📊 View detailed performance benchmarks →
llama-8b-nvfp4