← Back to Token Labs

🎯 Accuracy Results

IFEval Performance on vLLM Models

Auto-updated via CI/CD
Loading accuracy results...

📐 Methodology

What is IFEval?

IFEval (Instruction Following Evaluation) is a benchmark that measures how well language models follow specific instructions in prompts. It tests models on a variety of instruction types including formatting requirements, length constraints, keyword usage, and structural patterns.

Evaluation Metrics

  • Prompt-level Accuracy: Percentage of prompts where all instructions were followed correctly
  • Instruction-level Accuracy: Percentage of individual instructions followed correctly across all prompts
  • Number of Samples: Total number of test prompts evaluated (50 for quick tests, 541 for full evaluation)

Test Configuration

  • Framework: IFEval benchmark suite
  • Model Server: vLLM on NVIDIA DGX Spark
  • Quick Test: 50 samples (~5 minutes)
  • Full Test: 541 samples (~30 minutes)
📄 View Raw IFEval Data 📊 View Complete Benchmark Data