โ† Back to Token Labs

๐Ÿ“Š Benchmark Results

vLLM Performance on NVIDIA DGX Spark

Auto-updated via CI/CD
Loading benchmark results...

๐Ÿ“ Methodology

Hardware

  • Platform: NVIDIA DGX Spark
  • GPU: Grace Hopper (~120GB GPU memory)
  • Architecture: ARM64

Benchmark Configuration

  • Prefill Test: 100 prompts, 3072 input tokens, 1024 output tokens (3:1 ratio)
  • Cache Test: Enabled (LMCache with CPU offload) - 100 prompts with prefix repetition
  • Decode Test: 100 prompts, 1024 input tokens, 3072 output tokens (1:3 ratio)
  • Request Rate: 10 requests/second
  • Tool: vllm bench serve

Metrics Extraction

  • Prefill Throughput: Total token throughput from prefill-heavy test
  • Cached Throughput: Calculated from server metrics (prompt throughput ร— cache hit rate)
  • Decode Throughput: Output token throughput from decode-heavy test

Cost Calculation

Cost per million tokens = (DGX hourly cost ร— 1,000,000) รท (tokens/sec ร— 3600)

DGX Spark economics: $4000 hardware รท 26,280 hours (3yr @ 30% utilization) + electricity โ‰ˆ $0.07/hour

๐Ÿ“„ View Raw JSON Data