OpenAI-compatible chat completions on dedicated GPU instances. Pick a model, view its pricing, and call it with your own API key.
OpenAI-compatible /v1/chat/completions:
curl https://api.tokenlabs.run/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [
{"role": "user", "content": "Say hello from TokenLabs."}
]
}'
| Token type | Unit | Price (USD) |
|---|---|---|
| Input tokens (prompt) | per 1M | $0.0000 |
| Output tokens (completion) | per 1M | $0.0000 |
Prices shown here are per 1M tokens and are updated from CI benchmarks (e.g., vLLM throughput on RTX A4500 at $0.26/hr).
llama-8b-instruct