--max-connections | Concurrent connections |
--max-subprocesses | Parallel subprocesses |
--max-retries | Maximum retry attempts |
--timeout | Request timeout (seconds) |
--score --no_score | Grade the benchmark, or leave unscored |
--fail-on-error --no_fail_on_error | Stop on first error |
--retry_on_error | Retry samples if they encounter errors (by default, no retries occur). Specify —retry-on-error to retry a single time, or specify e.g. —retry-on-error=3 to retry multiple times. |
--sandbox_cleanup --no_sandbox_cleanup | Cleanup sandbox environments after task completes |
--trace | Trace message interactions with evaluated model to terminal |
--log_dir | Directory for log files |
--log_samples --no_log_samples | Log detailed samples and scores |
--log_images --no_log_images | Log base64 encoded images |
--log_buffer | Number of samples to buffer before writing to log |
--debug_errors | Enable debug mode for errors |
--debug | Enable debug mode with full stack traces |