Documentation Index
Fetch the complete documentation index at: https://openbench.dev/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Thebench eval command is the core of openbench, allowing you to evaluate any supported model on any available benchmark.
Usage
Arguments
| Argument | Description | Required |
|---|---|---|
benchmark | Name of the benchmark to run | Yes |
Basic Configuration Options
Commonly used configuration options for model selection, evaluation control, performance optimization:| Option | Description |
|---|---|
--model | Model to evaluate |
--limit | Number of questions to evaluate |
--epochs | Number of evaluation rounds |
--temperature | Sampling temperature |
--top-p | Nucleus sampling |
--seed | Random seed for reproducibility |
--message-limit | Max messages per sample |
--max-tokens | Maximum response tokens |
--max-connections | Concurrent API calls |
--max-subprocesses | Parallel subprocesses |
--max-tasks | Maximum number of tasks to run concurrently |
--fail-on-error | Failure threshold for sample errors |
--timeout | Request timeout (seconds) |
--sandbox | Container for running generated code |