Skip to main content

Documentation Index

Fetch the complete documentation index at: https://openbench.dev/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The bench list command displays all available benchmarks in openbench, organized by category with brief descriptions.
Cybersecurity benchmarks live in the separate openbench-cyber plugin. Because it contains live exploit payloads that trigger anti-malware tooling, you must install it manually after reviewing the risks: uv pip install "openbench-cyber @ git+https://github.com/groq/openbench-cyber.git@d93522ba70392cdceddb83f762c78a68923e70da". Once installed, the cybersecurity category appears automatically.

Usage

bench list [options]

Options

OptionShortDescription
--category-cFilter by category: core, math, etc.
--search-sSearch evaluations by name, description, or tags
--tags-tShow tags for each benchmark
--alphaInclude experimental/alpha benchmarks

Example Usage

Basic Usage

bench list
bench list example output
Core Benchmarks (58)                                                                                 
────────────────────────────────────────────────────────────
 boolq               BoolQ                 BoolQ: A Question Answering Dataset for Boolean Reasoning    
 browsecomp          BrowseComp            A Simple Yet Challenging Benchmark for Browsing Agents ... 
 drop                DROP                  Reading comprehension benchmark requiring discrete reason... 
 ...                                                                                              
Cybersecurity Benchmarks (4)
────────────────────────────────────────────────────────────
 cti_bench_ate       CTI-Bench ATE         Extracting MITRE ATT&CK techniques from malware and threa... 
 cti_bench_mcq       CTI-Bench MCQ         Multiple-choice questions evaluating understanding of CTI... 
 ...
 Math Benchmarks (12)
────────────────────────────────────────────────────────────
 aime_2023_I         AIME 2023 I           American Invitational Mathematics Examination 2023 (First)   
 aime_2023_II        AIME 2023 II          American Invitational Mathematics Examination 2023 (Second)  
 aime_2024           AIME 2024             American Invitational Mathematics Examination 2024 (Combi... 
 ...
 ────────────────────────────────────────────────────────────
Total: 75 benchmarks (use --alpha to see experimental benchmarks)

Filter by Category

# Use full flag name
bench list --category cybersecurity

# Or abbreviated
bench list -c cybersecurity
bench list -c example output
Cybersecurity Benchmarks (4)
────────────────────────────────────────────────────────────
 cti_bench_ate       CTI-Bench ATE         Extracting MITRE ATT&CK techniques from malware and threa... 
 cti_bench_mcq       CTI-Bench MCQ         Multiple-choice questions evaluating understanding of CTI... 
 cti_bench_rcm       CTI-Bench RCM         Mapping CVE descriptions to CWE categories to evaluate vu... 
 cti_bench_vsp       CTI-Bench VSP         Calculating CVSS scores from vulnerability descriptions t... 

────────────────────────────────────────────────────────────
Total: 5 benchmarks (use --alpha to see experimental benchmarks)