Welcome to openbench!
openbench is an open-source framework for standardized, reproducible benchmarking of large language models (LLMs). Our goal is to make evaluation both rigorous and accessible:- Run industry-standard benchmarks easily on any model, wherever it’s hosted.
- Design and run evaluations tailored to your specific needs.
- Choose from 30+ evaluation suites spanning knowledge, reasoning, coding, mathematics, and more.
What’s New in v0.5
ARC-AGI (with ARC Prize), plugins for external benchmarks, OpenRouter routing, code agents + Exercism, LiveMCPBench tool-calling, MultiChallenge, JSON logs — see the release notes.
Quick Start
Start Using openbench →
Install openbench and run your first benchmark in < 60 seconds.
Key Features
Simple CLI
All your eval needs accessible from the command line.
See more on CLI usage. →
Diverse Benchmarks
30+ reproducible evaluation suites, and growing.
See available benchmarks. →
Extensible Framework
Shared structure and utilities make it easy to add new evals.
Learn about our eval structure. →
Works with Any Model Provider
openbench supports 15+ model providers out of the box.Groq
Blazing fast inference
OpenAI
GPT-4, o3, and more
Anthropic
Claude Sonnet & Opus
Gemini models
OpenRouter
Unified LLM interface
15+ More
AWS Bedrock, Azure, Cohere, Together, and more.See a complete list of supported model providers.
Join the Community
GitHub Repository
Star us on GitHub and contribute to the project!
Report Issues
Found a bug or have a feature request? Let us know!