Skip to main content
To access any major model, set the appropriate API key environment variable:
ProviderEnvironment VariableExample Model String
AI21 LabsAI21_API_KEYai21/model-name
AnthropicANTHROPIC_API_KEYanthropic/model-name
AWS BedrockAWS credentialsbedrock/model-name
AzureAZURE_OPENAI_API_KEYazure/<deployment-name>
BasetenBASETEN_API_KEYbaseten/model-name
CerebrasCEREBRAS_API_KEYcerebras/model-name
CohereCOHERE_API_KEYcohere/model-name
CrusoeCRUSOE_API_KEYcrusoe/model-name
DeepInfraDEEPINFRA_API_KEYdeepinfra/model-name
FriendliFRIENDLI_TOKENfriendli/model-name
GoogleGOOGLE_API_KEYgoogle/model-name
GroqGROQ_API_KEYgroq/model-name
Hugging FaceHF_TOKENhuggingface/model-name
HyperbolicHYPERBOLIC_API_KEYhyperbolic/model-name
LambdaLAMBDA_API_KEYlambda/model-name
MiniMaxMINIMAX_API_KEYminimax/model-name
MistralMISTRAL_API_KEYmistral/model-name
MoonshotMOONSHOT_API_KEYmoonshot/model-name
NebiusNEBIUS_API_KEYnebius/model-name
Nous ResearchNOUS_API_KEYnous/model-name
Novita AINOVITA_API_KEYnovita/model-name
OllamaNone (local)ollama/model-name
OpenAIOPENAI_API_KEYopenai/model-name
OpenRouterOPENROUTER_API_KEYopenrouter/model-name
ParasailPARASAIL_API_KEYparasail/model-name
PerplexityPERPLEXITY_API_KEYperplexity/model-name
RekaREKA_API_KEYreka/model-name
SambaNovaSAMBANOVA_API_KEYsambanova/model-name
SiliconFlowSILICONFLOW_API_KEYsiliconflow/model-name
Together AITOGETHER_API_KEYtogether/model-name
Vercel AI GatewayAI_GATEWAY_API_KEYvercel/creator-name/model-name
W&B InferenceWANDB_API_KEYwandb/model-name
vLLMNone (local)vllm/model-name

Using Unsupported Providers

openbench works with any OpenAI-compatible API endpoint, even if the provider isn’t listed above. This allows you to benchmark models from new or specialized providers that support the OpenAI Chat Completions API format.
  1. Use the model string format: openai-api/<provider>/<model-name>
  2. Set environment variables: <PROVIDER>_API_KEY and <PROVIDER>_BASE_URL
# Example with Groq (shown for demonstration since Groq is natively supported)
export GROQ_API_KEY="gsk_..."
export GROQ_BASE_URL=https://api.groq.com/openai/v1

bench eval mmlu --model openai-api/groq/openai/gpt-oss-120b

General Model Configuration

CLI CommandEnvironment VariableDescription
--modelBENCH_MODELModel(s) to evaluate.
--model-base-urlBENCH_MODEL_BASE_URLBase URL for model(s).
--model-roleBENCH_MODEL_ROLEMap role(s) to specific models.
Use the -M flag for model-specific arguments (e.g. bench eval simpleqa --model openrouter/openai/gpt-oss-120b -M only=groq)

Provider-Specific Model Configuration

Groq

Groq provides fast inference with advanced features including tool calling and reasoning support. To begin using Groq, first set your API key:
export GROQ_API_KEY="gsk_..."
Streaming Messages Groq has streaming enabled by default. It is encouraged to use streaming for long tasks which might take the model >60 seconds before returning a response. Idle connections may be dropped after a certain period of time, which can cause the task to fail by timing out before receiving a response from the API. However, streaming can be disabled with -M stream=false:
# Streaming enabled by default
bench eval simpleqa \
  --model groq/llama-3.1-70b-versatile \
  --limit 10

# Explicitly disable streaming
bench eval simpleqa \
  --model groq/llama-3.1-70b-versatile \
  --limit 10 \
  -M stream=false
Tool Calling Groq supports built-in tools through the -M tools and -M tool_choice parameters:
ParameterDescriptionExample
toolsList of tool definitions-M tools='[{"type": "browser_search"}]'
tool_choiceTool selection strategy: auto (model decides), any (at least one tool), or none (never call a tool)-M tool_choice=auto
parallel_tool_callsEnable parallel tool execution-M parallel_tool_calls=true
# Using browser search tool with SimpleQA
bench eval simpleqa \
  --model groq/openai/gpt-oss-120b \
  --limit 10 \
  -M tools='[{"type": "browser_search"}]' \
  -M tool_choice=auto

OpenRouter

OpenRouter allows access to 60+ providers and 500+ models, all through one centralized platform. See a comprehensive list of available models here. To begin using OpenRouter, first set your API key:
export OPENROUTER_API_KEY="sk-or-..."
OpenRouter Configuration Options
Provider routing parameters can be specified with the -M flag to control which providers are used.:
ParameterDescriptionExample
onlyRestrict to specific providers-M only=groq or
-M only=cerebras,openai
orderProvider priority order-M order=openai,anthropic
allow_fallbacksEnable/disable fallback providers-M allow_fallbacks=True
ignore`Providers to skip-M ignore=cerebras,fireworks
sortSort providers-M sort=price or
-M sort=throughput
max_priceMaximum price limits-M max_price={"completion": 0.01}
quantizationsFilter by quantization levels-M quantizations=int4,int8
require_parametersRequire parameter support-M require_parameters=False
data_collectionData collection setting-M data_collection=allow or
-M data_collection=deny
bench eval mmlu 
    --model openrouter/openai/gpt-oss-120b 
    --max-connections 100
    -M only=groq,together 
    -M sort=price