Skip to main content

Overview

The bench view command displays results from previous evaluations, allowing you to analyze performance, compare models, and track progress over time.

Usage

bench view [options]

Options

OptionDescription
--log-dirLog directory to view (defaults to ./logs)
--recursive
—no-recursive`
Include all logs in log_dir recursively
--hostCP/IP host for server
--portTCP/IP port for server
--log-levelSet the log level

Examples

View Latest Result

bench view
Example Evaluation Logs Summary:
localhost
| Task      |       Model        | Score | Status |         Completed        |                            File Name                           |
|-----------|--------------------|-------|--------|--------------------------|----------------------------------------------------------------|
| mmlu      | openai/o3-mini     | 0.82  |    ✓   | Sat Aug 16 2025 10:39 PM | 2025-08-16T22-39-13-04-00_mmlu_g5QsKYFFAR7zNSuMMs9a85.eval     |
| humaneval | anthropic/claude-3 | 0.74  |    ✓   | Fri Aug 16 2025 03:22 PM | 2025-08-16T15-22-41-08-00_humaneval_k2mNpR8vLx3wQfE7Hs4B2.eval |
| gpqa      | groq/llama-3.3-70b | 0.43  |    ✓   | Thu Aug 04 2025 11:45 AM | 2025-08-04T11-45-09-12-00_gpqa_diamond_v9XzTpL5Kj8rY3mQ7.eval |
| math      | openai/gpt-4o      | 0.67  |    ✓   | Wed Aug 03 2025 08:15 AM | 2025-08-03T08-15-32-07-00_math_u4JhWq2NvL6xKc9PzM8sA1.eval    |
| simpleqa  | openai/gpt-4o-mini | 0.58  |    ⚠   | Tue Jul 07 2025 05:30 PM | 2025-07-07T17-30-18-05-00_simpleqa_b7FgRp3XvK2nY9jQ6L.eval    |
| ...       | ...                | ...   |  ...   | ...                      | ...                                                            |
Each entry in the evaluation logs summary can be expanded to show a detailed evaluation breakdown: Detailed Evaluation Breakdown