Promptry is a library with a CLI, a FastAPI dashboard, a GitHub Action, and an MCP server stapled to the same local store. Your code calls track() and track_invocation(); everything else — evals, cost drill-down, the live prompt CMS, budgets, traces — reads and writes the same SQLite file. No queues, no daemons, no cloud.
Your code talks to the SDK. The SDK writes two ledgers to SQLite — versioned prompt templates and per-call invocations. Everything else reads from SQLite.
The entire product is a thin layer over this schema, applied through numbered migrations on a schema_version table. Open the database with sqlite3 and every read the dashboard makes is reproducible on the command line.
One row per unique (name, hash). Dedup + auto-incremented version per name.
Named labels like prod, staging, dev. Promotion moves an env tag to one version, so render_prompt(env=…) serves exactly that one.
One row per suite execution. Drift and comparison queries read from here.
Every assertion, every run. Semantic, judge, JSON, regex, grounding, tool-use, conversation.
Thumbs up/down from users. Closes the loop from production back to the eval suite.
Versioned test data. Pin a suite to a dataset version for reproducible runs.
The per-call ledger written by track_invocation(). One row per LLM call — no dedup. Tokens, cost, latency and model live in metadata; request/response text is captured only when you opt in.
End-user ratings ingested via POST /api/feedback, correlated to the exact invocation by request_id. Joins back to a call's trace.
Spend caps. Current-period spend is summed from the invocations ledger on read; breaches surface in the dashboard.
End-to-end on a typical pipeline. Promptry's own overhead is dwarfed by the LLM calls themselves, and it never sits in the request's critical path.
Every architectural choice traces back to the same constraint: no service to run, no vendor to trust.