Overview
Thebench eval command is the core of openbench, allowing you to evaluate any supported model on any available benchmark.
Usage
Arguments
| Argument | Description | Required |
|---|---|---|
benchmark | Name of the benchmark to run | Yes |
Basic Configuration Options
Commonly used configuration options for model selection, evaluation control, performance optimization:| Option | Description |
|---|---|
--model | Model to evaluate |
--limit | Number of questions to evaluate |
--epochs | Number of evaluation rounds |
--temperature | Sampling temperature |
--top-p | Nucleus sampling |
--seed | Random seed for reproducibility |
--message-limit | Max messages per sample |
--max-tokens | Maximum response tokens |
--max-connections | Concurrent API calls |
--max-subprocesses | Parallel subprocesses |
--max-tasks | Maximum number of tasks to run concurrently |
--fail-on-error | Failure threshold for sample errors |
--timeout | Request timeout (seconds) |
--sandbox | Container for running generated code |