Skip to main content

Overview

The bench eval-retry command allows you to resume or retry evaluations that failed, timed out, or were interrupted. It automatically identifies incomplete samples and reruns only those that need it.

Usage

bench eval-retry <log_file> [options]

Arguments

ArgumentDescriptionRequired
log_filePath to the evaluation log fileYes

Options

OptionDescription
--max-connectionsConcurrent connections
--max-subprocessesParallel subprocesses
--max-retriesMaximum retry attempts
--timeoutRequest timeout (seconds)
--score
--no_score
Grade the benchmark, or leave unscored
--fail-on-error
--no_fail_on_error
Stop on first error
--retry_on_errorRetry samples if they encounter errors (by default, no retries occur). Specify —retry-on-error to retry a single time, or specify e.g. —retry-on-error=3 to retry multiple times.
--sandbox_cleanup
--no_sandbox_cleanup
Cleanup sandbox environments after task completes
--traceTrace message interactions with evaluated model to terminal
--log_dirDirectory for log files
--log_samples
--no_log_samples
Log detailed samples and scores
--log_images
--no_log_images
Log base64 encoded images
--log_bufferNumber of samples to buffer before writing to log
--debug_errorsEnable debug mode for errors
--debugEnable debug mode with full stack traces

Example Retry Configuration

# Retry with adjusted parameters
bench eval-retry logs/inspect_eval/failed_eval.json \
  --max-retries 10 \   # More retry attempts
  --timeout 180 \   # Increase timeout
  --max-connections 2   # Reduce concurrent requests