Cybersecurity

⚠️ Manual install required: the openbench-cyber plugin packages real exploit code and forensic artifacts that many anti-malware tools flag. Review the risks, then install intentionally with uv pip install "openbench-cyber @ git+https://github.com/groq/openbench-cyber.git@d93522ba70392cdceddb83f762c78a68923e70da".

The cybersecurity benchmarks live in the openbench-cyber plugin. Because of the sensitive payloads we do not ship it as a transitive dependency or optional extra—installing the plugin is an explicit opt-in. Once installed, these tasks automatically appear in bench list. The plugin currently ships the CTI-Bench suite for evaluating model performance on cyber threat intelligence tasks. CTI-Bench provides four benchmarks covering threat extraction, vulnerability classification, security knowledge, and risk assessment.

Available Benchmarks

ATE: ATT&CK Technique Extraction

Extracts MITRE ATT&CK techniques from malware and threat descriptions.

bench eval cti_bench_ate

MCQ: Security Knowledge

Multiple-choice questions on CTI standards, threats, detection strategies, and best practices from NIST and MITRE.

bench eval cti_bench_mcq

RCM: Vulnerability Classification

Maps CVE descriptions to CWE (Common Weakness Enumeration) categories.

bench eval cti_bench_rcm

VSP: CVSS Score Prediction

Calculates CVSS scores from vulnerability descriptions.

bench eval cti_bench_vsp

CyBench (CTF Challenges)

cybench delivers end-to-end agentic CTF scenarios powered by inspect-cyber. Because the default sandbox grants elevated network access, you must acknowledge the risks before running:

export CYBENCH_ACKNOWLEDGE_RISKS=1
bench eval cybench

Pass --sandbox k8s to use a Kubernetes sandbox when the optional inspect_k8s_sandbox dependency is available.

Getting Started

Benchmarks

CLI Reference

Development

Available Benchmarks

ATE: ATT&CK Technique Extraction

MCQ: Security Knowledge

RCM: Vulnerability Classification

VSP: CVSS Score Prediction

CyBench (CTF Challenges)

Getting Started

Benchmarks

CLI Reference

Development

​Available Benchmarks

ATE: ATT&CK Technique Extraction

MCQ: Security Knowledge

RCM: Vulnerability Classification

VSP: CVSS Score Prediction

​CyBench (CTF Challenges)

​Related Resources

Available Benchmarks

CyBench (CTF Challenges)

Related Resources