← Back to Token Labs

📊 Benchmark Results

vLLM Performance on NVIDIA DGX Spark

Auto-updated via CI/CD

Loading benchmark results...

📐 Methodology

Hardware

Platform: NVIDIA DGX Spark
GPU: Grace Hopper (~120GB GPU memory)
Architecture: ARM64

Benchmark Configuration

Prefill Test: 100 prompts, 3072 input tokens, 1024 output tokens (3:1 ratio)
Cache Test: Enabled (LMCache with CPU offload) - 100 prompts with prefix repetition
Decode Test: 100 prompts, 1024 input tokens, 3072 output tokens (1:3 ratio)
Request Rate: 10 requests/second
Tool: vllm bench serve

Metrics Extraction

Prefill Throughput: Total token throughput from prefill-heavy test
Cached Throughput: Calculated from server metrics (prompt throughput × cache hit rate)
Decode Throughput: Output token throughput from decode-heavy test

Cost Calculation

Cost per million tokens = (DGX hourly cost × 1,000,000) ÷ (tokens/sec × 3600)

DGX Spark economics: $4000 hardware ÷ 26,280 hours (3yr @ 30% utilization) + electricity ≈ $0.07/hour

📄 View Raw JSON Data