Skip to main content
openbench includes a comprehensive suite of knowledge benchmarks covering diverse subjects from undergraduate to graduate level, spanning multiple languages and domains.

Available Benchmarks

MMLU

Massive Multitask Language Understanding across 57 subjects including STEM, humanities, and social sciences.
bench eval mmlu

MMLU-Pro

Enhanced version of MMLU with more challenging questions and additional subjects.
bench eval mmlu_pro

GPQA Diamond

Graduate-level science questions (PhD-level) in physics, chemistry, and biology.
bench eval gpqa_diamond

SuperGPQA

Extended graduate-level question answering spanning 285 academic disciplines.
bench eval supergpqa

TUMLU

Turkish Understanding and Multitask Language Understanding across 9 languages.
bench eval tumlu

OpenBookQA

Question answering requiring multi-step reasoning with elementary science knowledge.
bench eval openbookqa

HLE

Humanity’s Last Exam - 2,500 expert-written questions from 1,000+ domain experts across diverse fields.
bench eval hle

HLE Text

Text-only version of Humanity’s Last Exam without visual components.
bench eval hle_text