Production-ready inference endpoints powered by NVIDIA DGX Spark. OpenAI-compatible API, pay-per-token pricing.
Select a model to view pricing, try the playground, or grab API code.
Prices derived from CI benchmarks (vLLM on DGX Spark). Free tier available.
Compatible with openai, litellm, langchain.
OpenAI-compatible chat completions. Streaming via stream: true.
| Parameter | Type | Description |
|---|---|---|
| model | string | Full model identifier |
| messages | array | Array of {role, content} objects |
| max_tokens | integer | Max tokens to generate (default 512) |
| temperature | float | Sampling temperature 0–2 (default 0.7) |
| top_p | float | Nucleus sampling (default 1.0) |
| stream | boolean | Stream via SSE (default false) |