openbench provides a diverse set of coding benchmarks to assess model capabilities in code generation, problem solving, and software engineering tasks across multiple programming languages.Documentation Index
Fetch the complete documentation index at: https://openbench.dev/llms.txt
Use this file to discover all available pages before exploring further.
Available Benchmarks
HumanEval
164 hand-written programming problems testing function-level code generation capabilities.
MBPP
Mostly Basic Programming Problems - entry-level Python programming challenges.
SciCode
Scientific computing problems requiring domain knowledge and programming skills. (Alpha)
GMCQ
Graduate-level multiple-choice questions on computer science fundamentals.
JSONSchemaBench
Tests ability to generate valid JSON outputs conforming to specific schemas.
Exercism
Real-world coding tasks as an agent evaluation across 5 programming languages.