Hardware
- Platform: NVIDIA DGX Spark
- GPU: Grace Hopper (~120GB GPU memory)
- Architecture: ARM64
Benchmark Configuration
- Prefill Test: 100 prompts, 3072 input tokens, 1024 output tokens (3:1 ratio)
- Cache Test: Enabled (LMCache with CPU offload) - 100 prompts with prefix repetition
- Decode Test: 100 prompts, 1024 input tokens, 3072 output tokens (1:3 ratio)
- Request Rate: 10 requests/second
- Tool:
vllm bench serve
Metrics Extraction
- Prefill Throughput: Total token throughput from prefill-heavy test
- Cached Throughput: Calculated from server metrics (prompt throughput ร cache hit rate)
- Decode Throughput: Output token throughput from decode-heavy test
Cost Calculation
Cost per million tokens = (DGX hourly cost ร 1,000,000) รท (tokens/sec ร 3600)
DGX Spark economics: $4000 hardware รท 26,280 hours (3yr @ 30% utilization) + electricity โ $0.07/hour
๐ View Raw JSON Data