⚠️ Manual install required: theThe cybersecurity benchmarks live in the openbench-cyber plugin. Because of the sensitive payloads we do not ship it as a transitive dependency or optional extra—installing the plugin is an explicit opt-in. Once installed, these tasks automatically appear inopenbench-cyberplugin packages real exploit code and forensic artifacts that many anti-malware tools flag. Review the risks, then install intentionally withuv pip install "openbench-cyber @ git+https://github.com/groq/openbench-cyber.git@d93522ba70392cdceddb83f762c78a68923e70da".
bench list.
The plugin currently ships the CTI-Bench suite for evaluating model performance on cyber threat intelligence tasks. CTI-Bench provides four benchmarks covering threat extraction, vulnerability classification, security knowledge, and risk assessment.
Available Benchmarks
ATE: ATT&CK Technique Extraction
Extracts MITRE ATT&CK techniques from malware and threat descriptions.
MCQ: Security Knowledge
Multiple-choice questions on CTI standards, threats, detection strategies, and best practices from NIST and MITRE.
RCM: Vulnerability Classification
Maps CVE descriptions to CWE (Common Weakness Enumeration) categories.
VSP: CVSS Score Prediction
Calculates CVSS scores from vulnerability descriptions.
CyBench (CTF Challenges)
cybench delivers end-to-end agentic CTF scenarios powered by inspect-cyber. Because the default sandbox grants elevated network access, you must acknowledge the risks before running: