# aevals From Zero to Evals. Go from zero evals to running evals in three commands. Scan your agent code, generate test scenarios, and score runs against deterministic constraints and LLM-judged rubrics. ## Install ```bash uv add aevals # or pip install aevals ``` ## Quick start ```bash aevals scan # detect SDK + entrypoint aevals init # generate aevals.yaml aevals run # execute scenarios ``` ## What is aevals? Most teams building agents know they should eval. They don't. The problem isn't motivation — it's that nobody knows where to start. aevals closes that gap. Point it at your codebase and it figures out the rest — which SDKs you use, where your entrypoint is, what tools your agent has. - **Zero to eval** — Scans your codebase, detects SDKs and entrypoints, generates config. You edit scenarios, not scaffolding. - **Deterministic constraints** — Duration, step count, tool ordering, loop detection. Zero LLM cost. Your CI gate, for free. - **LLM-as-judge rubrics** — Natural-language assertions scored against the full trajectory — every LLM call and tool invocation, not just the final answer. - **OTel tracing** — Auto-instruments 6 LLM SDKs via OpenTelemetry. Pipe traces to Langfuse, Phoenix, Jaeger, or any OTel backend. - **Subprocess isolation** — Each scenario runs in its own process with independent tracing. No shared state, no crosstalk. - **No platform** — pip install, BYO API keys, all data stays local. Same command on your laptop and in CI. ## Links - [GitHub](https://github.com/satyaborg/aevals) - [PyPI](https://pypi.org/project/aevals/) - [Docs](https://aevals.dev/docs) --- MIT License