| Task | Model | Score | Status | Completed | File Name |
|-----------|--------------------|-------|--------|--------------------------|----------------------------------------------------------------|
| mmlu | openai/o3-mini | 0.82 | ✓ | Sat Aug 16 2025 10:39 PM | 2025-08-16T22-39-13-04-00_mmlu_g5QsKYFFAR7zNSuMMs9a85.eval |
| humaneval | anthropic/claude-3 | 0.74 | ✓ | Fri Aug 16 2025 03:22 PM | 2025-08-16T15-22-41-08-00_humaneval_k2mNpR8vLx3wQfE7Hs4B2.eval |
| gpqa | groq/llama-3.3-70b | 0.43 | ✓ | Thu Aug 04 2025 11:45 AM | 2025-08-04T11-45-09-12-00_gpqa_diamond_v9XzTpL5Kj8rY3mQ7.eval |
| math | openai/gpt-4o | 0.67 | ✓ | Wed Aug 03 2025 08:15 AM | 2025-08-03T08-15-32-07-00_math_u4JhWq2NvL6xKc9PzM8sA1.eval |
| simpleqa | openai/gpt-4o-mini | 0.58 | ⚠ | Tue Jul 07 2025 05:30 PM | 2025-07-07T17-30-18-05-00_simpleqa_b7FgRp3XvK2nY9jQ6L.eval |
| ... | ... | ... | ... | ... | ... |