Reasoning - openbench

openbench provides comprehensive reasoning benchmarks covering factuality, logical reasoning, multi-hop inference, and multimodal understanding across diverse domains.

Available Benchmarks

SimpleQA

Tests factuality and accuracy on straightforward questions with verifiable answers.

bench eval simpleqa

MuSR

Multi-Step Reasoning benchmark with murder mysteries, object placements, and team allocation problems.

bench eval musr

DROP

Discrete Reasoning Over Paragraphs - numerical and span-based reasoning over text.

bench eval drop

GraphWalks

Multi-hop reasoning through graph structures to test navigation and inference.

bench eval graphwalks

BrowseComp

Web browsing agent tasks requiring navigation and information synthesis.

bench eval browsecomp

MMMU

Massive Multi-discipline Multimodal Understanding across college-level subjects.

bench eval mmmu

MMMU Pro

Enhanced version of MMMU with more challenging multimodal problems.

bench eval mmmu_pro

SimpleQA by OpenAI
MuSR Paper
DROP Paper
MMMU Paper

Math

Cybersecurity

Documentation Index

​Available Benchmarks