Quick Tips
Start Small
Always test with
--limit 10 before running full benchmarksUse Model & Task Flags
Use
-M for model and -T for any benchmark-specific argumentsDebug Mode
Use
--debug for full stack tracing when troubleshootingDetailed Breakdown
Use
bench view for detailed sample-by-sample evaluation breakdownGlobal Help
Use
--help on any command to see all available optionsUse Groq for Testing
Free tier with fast inference - perfect for development
Common Issues & Solutions
Command 'bench' not found, import errors, or missing dependencies
Command 'bench' not found, import errors, or missing dependencies
Package not properly installed, try:
Environment variables not working
Environment variables not working
Configuration precedence confusion
Configuration precedence confusion
Remember: Command-line arguments override environment variables
Reasoning effort not applied or invalid
Reasoning effort not applied or invalid
The
reasoning_effort parameter is now a first-class CLI flag.Runtime Errors
| Error | Cause | Solution |
|---|---|---|
API key not found | Missing credentials | Set OPENAI_API_KEY or relevant env var |
Rate limit exceeded | Too many parallel requests | Reduce --max-connections |
Model not found | Invalid model name | Check provider documentation |
Timeout | Slow model responses | Increase --timeout |
Out of memory | Large benchmark/batch | Use --limit to reduce size |