bench list

Overview

The bench list command displays all available benchmarks in openbench, organized by category with brief descriptions.

Cybersecurity benchmarks live in the separate openbench-cyber plugin. Because it contains live exploit payloads that trigger anti-malware tooling, you must install it manually after reviewing the risks: uv pip install "openbench-cyber @ git+https://github.com/groq/openbench-cyber.git@d93522ba70392cdceddb83f762c78a68923e70da". Once installed, the cybersecurity category appears automatically.

Usage

bench list [options]

Options

Option	Short	Description
`--category`	`-c`	Filter by category: core, math, etc.
`--search`	`-s`	Search evaluations by name, description, or tags
`--tags`	`-t`	Show tags for each benchmark
`--alpha`		Include experimental/alpha benchmarks

Example Usage

Basic Usage

bench list

bench list example output

Core Benchmarks (58)                                                                                 
────────────────────────────────────────────────────────────
 boolq               BoolQ                 BoolQ: A Question Answering Dataset for Boolean Reasoning    
 browsecomp          BrowseComp            A Simple Yet Challenging Benchmark for Browsing Agents ... 
 drop                DROP                  Reading comprehension benchmark requiring discrete reason... 
 ...                                                                                              
Cybersecurity Benchmarks (4)
────────────────────────────────────────────────────────────
 cti_bench_ate       CTI-Bench ATE         Extracting MITRE ATT&CK techniques from malware and threa... 
 cti_bench_mcq       CTI-Bench MCQ         Multiple-choice questions evaluating understanding of CTI... 
 ...
 Math Benchmarks (12)
────────────────────────────────────────────────────────────
 aime_2023_I         AIME 2023 I           American Invitational Mathematics Examination 2023 (First)   
 aime_2023_II        AIME 2023 II          American Invitational Mathematics Examination 2023 (Second)  
 aime_2024           AIME 2024             American Invitational Mathematics Examination 2024 (Combi... 
 ...
 ────────────────────────────────────────────────────────────
Total: 75 benchmarks (use --alpha to see experimental benchmarks)

Filter by Category

# Use full flag name
bench list --category cybersecurity

# Or abbreviated
bench list -c cybersecurity

bench list -c example output

Cybersecurity Benchmarks (4)
────────────────────────────────────────────────────────────
 cti_bench_ate       CTI-Bench ATE         Extracting MITRE ATT&CK techniques from malware and threa... 
 cti_bench_mcq       CTI-Bench MCQ         Multiple-choice questions evaluating understanding of CTI... 
 cti_bench_rcm       CTI-Bench RCM         Mapping CVE descriptions to CWE categories to evaluate vu... 
 cti_bench_vsp       CTI-Bench VSP         Calculating CVSS scores from vulnerability descriptions t... 

────────────────────────────────────────────────────────────
Total: 5 benchmarks (use --alpha to see experimental benchmarks)

Getting Started

Benchmarks

CLI Reference

Development

Overview

Usage

Options

Example Usage

Basic Usage

Filter by Category

Getting Started

Benchmarks

CLI Reference

Development

​Overview

​Usage

​Options

​Example Usage

​Basic Usage

​Filter by Category

Overview

Usage

Options

Example Usage

Basic Usage

Filter by Category