Fixture-first CLI
Run deterministic checks against saved baseline and candidate outputs without API keys, model calls, or external dependencies.
For AI product builders and prompt consultants
Compare baseline and candidate prompt outputs, score the differences with local checks, and generate an HTML report before prompt changes reach users or clients.
What you get
Use it when a prompt, example, model setting, or system instruction changes and you need a concrete before-and-after report instead of gut feel.
Run deterministic checks against saved baseline and candidate outputs without API keys, model calls, or external dependencies.
Generate HTML and JSON reports with pass rate, regression count, check notes, and side-by-side output comparison.
Use required terms, blocked terms, regex patterns, word limits, critical gates, and rubric criteria.
Workflow
Save the current trusted prompt output as the baseline.
Run the changed prompt and save the candidate output.
Score the candidate with local checks and critical gates.
Review the HTML report before shipping the prompt change.
Deliberately local
This first version is fixture-based so prompt teams can agree on expected behavior without adding API cost, credentials, or network variance. It is one QA layer, not a guarantee that every AI output is correct or safe.
One-time purchase
Download the CLI, sample suites, JSON schema, documentation, changelog, and license.
Open Gumroad checkout