Skip to main content
⚠️ Manual install required: the openbench-cyber plugin packages real exploit code and forensic artifacts that many anti-malware tools flag. Review the risks, then install intentionally with uv pip install "openbench-cyber @ git+https://github.com/groq/openbench-cyber.git@d93522ba70392cdceddb83f762c78a68923e70da".
The cybersecurity benchmarks live in the openbench-cyber plugin. Because of the sensitive payloads we do not ship it as a transitive dependency or optional extra—installing the plugin is an explicit opt-in. Once installed, these tasks automatically appear in bench list. The plugin currently ships the CTI-Bench suite for evaluating model performance on cyber threat intelligence tasks. CTI-Bench provides four benchmarks covering threat extraction, vulnerability classification, security knowledge, and risk assessment.

Available Benchmarks

ATE: ATT&CK Technique Extraction

Extracts MITRE ATT&CK techniques from malware and threat descriptions.
bench eval cti_bench_ate 

MCQ: Security Knowledge

Multiple-choice questions on CTI standards, threats, detection strategies, and best practices from NIST and MITRE.
bench eval cti_bench_mcq 

RCM: Vulnerability Classification

Maps CVE descriptions to CWE (Common Weakness Enumeration) categories.
bench eval cti_bench_rcm 

VSP: CVSS Score Prediction

Calculates CVSS scores from vulnerability descriptions.
bench eval cti_bench_vsp

CyBench (CTF Challenges)

cybench delivers end-to-end agentic CTF scenarios powered by inspect-cyber. Because the default sandbox grants elevated network access, you must acknowledge the risks before running:
export CYBENCH_ACKNOWLEDGE_RISKS=1
bench eval cybench
Pass --sandbox k8s to use a Kubernetes sandbox when the optional inspect_k8s_sandbox dependency is available.