Python-native eval suites, automatic prompt versioning, statistical drift detection. One SQLite file on your machine — no YAML, no account, no cloud, no per-run judge tax.
Eight primitives that cover the lifecycle, from the moment a prompt lands in source control to the moment it misbehaves in production.
track() call is content-hashed. Diff any two versions.@suite.templates.toml.No server to run. No account to create. No vendor telemetry to audit.
Code instead of YAML. Local instead of cloud. Deterministic assertions instead of a metered LLM-judge tax. Together, in one tool.
| Dimension | promptry | Promptfoo | RAGAS | LangSmith | DeepEval |
|---|---|---|---|---|---|
| Authoring surface | Python decorators | YAML | Python metrics | SaaS UI | Python |
| Data location | Local SQLite | Local | Local | Their cloud | Local + push |
| Account required | No, ever | No | No | Yes | No (OSS) |
| CI cost per run | $0 deterministic | Mixed | Per-judge-call | Trace volume | Per-judge-call |
| Prompt versioning | Automatic content-hash | Manual + git | · | Prompt Hub | · |
| Drift detection | Mann-Whitney U + p-values | · | · | Dashboards only | · |
| MCP server | Native | Plugin | · | · | Partial |
| Commercial tier | None planned | Enterprise | None | SaaS | Confident AI |
Install, wrap one prompt, write one assertion. Promptry takes it from there. Deterministic assertions by default — LLM-as-judge is opt-in, for when you actually want to spend the tokens.