TokenLabs · Inference API

Model Console

OpenAI-compatible chat completions on dedicated GPU instances. Pick a model, view its pricing, and call it with your own API key.

Model & instance
Model
Context ≈ 8K tokens
GPU RTX A4500 · 20GB
Instance Primary

API endpoint

OpenAI-compatible /v1/chat/completions:

curl https://api.tokenlabs.run/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [
      {"role": "user", "content": "Say hello from TokenLabs."}
    ]
  }'
Pricing · per model
Token type Unit Price (USD)
Input tokens (prompt) per 1M $0.0000
Output tokens (completion) per 1M $0.0000

Prices shown here are per 1M tokens and are updated from CI benchmarks (e.g., vLLM throughput on RTX A4500 at $0.26/hr).

Chat with the selected model
Model: llama-8b-instruct
Message Press Enter to send · Shift+Enter for newline