Skip to main content

Documentation Index

Fetch the complete documentation index at: https://openbench.dev/llms.txt

Use this file to discover all available pages before exploring further.

openbench provides a diverse set of coding benchmarks to assess model capabilities in code generation, problem solving, and software engineering tasks across multiple programming languages.

Available Benchmarks

HumanEval

164 hand-written programming problems testing function-level code generation capabilities.
bench eval humaneval

MBPP

Mostly Basic Programming Problems - entry-level Python programming challenges.
bench eval mbpp

SciCode

Scientific computing problems requiring domain knowledge and programming skills. (Alpha)
bench eval scicode --alpha

GMCQ

Graduate-level multiple-choice questions on computer science fundamentals.
bench eval gmcq

JSONSchemaBench

Tests ability to generate valid JSON outputs conforming to specific schemas.
bench eval jsonschemabench

Exercism

Real-world coding tasks as an agent evaluation across 5 programming languages.
bench eval exercism