openbench provides comprehensive reasoning benchmarks covering factuality, logical reasoning, multi-hop inference, and multimodal understanding across diverse domains.Documentation Index
Fetch the complete documentation index at: https://openbench.dev/llms.txt
Use this file to discover all available pages before exploring further.
Available Benchmarks
SimpleQA
Tests factuality and accuracy on straightforward questions with verifiable answers.
MuSR
Multi-Step Reasoning benchmark with murder mysteries, object placements, and team allocation problems.
DROP
Discrete Reasoning Over Paragraphs - numerical and span-based reasoning over text.
GraphWalks
Multi-hop reasoning through graph structures to test navigation and inference.
BrowseComp
Web browsing agent tasks requiring navigation and information synthesis.
MMMU
Massive Multi-discipline Multimodal Understanding across college-level subjects.
MMMU Pro
Enhanced version of MMMU with more challenging multimodal problems.