openbench

Welcome to openbench!

openbench is an open-source framework for standardized, reproducible benchmarking of large language models (LLMs). Our goal is to make evaluation both rigorous and accessible:

Run industry-standard benchmarks easily on any model, wherever it’s hosted.
Design and run evaluations tailored to your specific needs.
Choose from 30+ evaluation suites spanning knowledge, reasoning, coding, mathematics, and more.

With openbench, you can build trust in model performance through transparent, reproducible, and domain-relevant evaluation.

What’s New in v0.5

ARC-AGI (with ARC Prize), plugins for external benchmarks, OpenRouter routing, code agents + Exercism, LiveMCPBench tool-calling, MultiChallenge, JSON logs — see the release notes.

Quick Start

Start Using openbench →

Install openbench and run your first benchmark in < 60 seconds.

Key Features

Simple CLI

All your eval needs accessible from the command line.

See more on CLI usage. →

Diverse Benchmarks

30+ reproducible evaluation suites, and growing.

See available benchmarks. →

Extensible Framework

Shared structure and utilities make it easy to add new evals.

Learn about our eval structure. →

Works with Any Model Provider

openbench supports 15+ model providers out of the box.

Groq

Blazing fast inference

groq/llama-3.3-70b

OpenAI

GPT-4, o3, and more

openai/gpt-4o

Anthropic

Claude Sonnet & Opus

anthropic/claude-3-5-sonnet

Google

Gemini models

google/gemini-2.5-pro

OpenRouter

Unified LLM interface

openrouter/deepseek/deepseek-chat-v3.1

15+ More

AWS Bedrock, Azure, Cohere, Together, and more.See a complete list of supported model providers.

Join the Community

GitHub Repository

Star us on GitHub and contribute to the project!

Report Issues

Found a bug or have a feature request? Let us know!

Stay Updated

We are rapidly iterating! Sign up below to recieve updates about latest openbench features.

Getting Started

Benchmarks

CLI Reference

Development

Welcome to openbench!

What’s New in v0.5

Quick Start

Start Using openbench →

Key Features

Simple CLI

Diverse Benchmarks

Extensible Framework

Works with Any Model Provider

Groq

OpenAI

Anthropic

Google

OpenRouter

15+ More

Join the Community

GitHub Repository

Report Issues

Stay Updated

Getting Started

Benchmarks

CLI Reference

Development

​Welcome to openbench!

What’s New in v0.5

​Quick Start

Start Using openbench →

​Key Features

Simple CLI

Diverse Benchmarks

Extensible Framework

​Works with Any Model Provider

Groq

OpenAI

Anthropic

Google

OpenRouter

15+ More

​Join the Community

GitHub Repository

Report Issues

​Stay Updated

Welcome to openbench!

Quick Start

Key Features

Works with Any Model Provider

Join the Community

Stay Updated