The Jest for LLM prompt testing. Write test cases, run evals across OpenAI and Anthropic, and catch regressions automatically in CI.
No credit card required · Works with OpenAI + Anthropic
One config file. Write tests like Jest. Run with npx phasio.
// phasio.config.ts
import { defineConfig } from '@phasio/sdk';
export default defineConfig({
apiKey: process.env.PHASIO_API_KEY,
providers: [
{ provider: 'openai', llmKey: process.env.OPENAI_API_KEY, model: 'gpt-4o-mini' },
{ provider: 'anthropic', llmKey: process.env.ANTHROPIC_API_KEY, model: 'claude-haiku-4-5-20251001' },
],
versions: [
{ label: 'v1', template: 'Summarize briefly: {{input}}' },
{ label: 'v2', template: 'One sentence summary of: {{input}}' },
],
failOnThreshold: 80,
exitOnFail: true,
});
// phasio/summarizer.test.ts
import { describe, pe, contains, llmJudge } from '@phasio/sdk';
describe('Summarizer', () => {
pe.test('includes key terms', {
input: 'The quick brown fox jumped over the lazy dog.',
expect: contains('fox'),
});
pe.test('quality check', {
input: 'Explain what an API is.',
expect: llmJudge('Clear, concise, suitable for a beginner'),
});
});
Phasio v0.4.4
Running 1 test file...
▶ phasio/summarizer.test.ts
Summarizer
────────────────────────────────────────────────
✓ includes key terms openai/v1 100% 820ms openai/v2 100% 743ms 2.8s
✓ quality check openai/v1 100% 1.2s openai/v2 100% 980ms 4.2s
════════════════════════════════════════════════
Test Results
════════════════════════════════════════════════
Suite Tests Results Time
───────────────────────────────────────────────
✓ Summarizer 2/2 all passed 7.0s
───────────────────────────────────────────────
Total 2/2 all passed 7.0s
════════════════════════════════════════════════
✓ All suites passed
════════════════════════════════════════════════
✓ All test files passed
From versioning to regression detection — Phasio covers the full prompt lifecycle.
Track every change to your prompts. Compare v1 vs v2 side by side with full eval history.
describe, pe.test, beforeAll, afterAll — zero learning curve for JS/TS developers.
Run the same suite against OpenAI and Anthropic in parallel. See results per provider per version.
Define natural language criteria. Multiple judges score independently — scores are averaged for unbiased results.
npx phasio finds and runs all *.test.ts files. Exits with code 1 on failure — your pipeline fails automatically.
Track score trends over time. Sync eval results to phasio.dev with telemetry: true.
Up and running in under 5 minutes.
Define your providers, judge providers, prompt versions, and thresholds in one config file at your project root.
Create *.test.ts files in your phasio/ folder. Use describe, pe.test, and validators — no boilerplate.
The CLI discovers all test files, injects config, runs evals across all providers and versions, and prints a report.
See exactly what improved and what regressed per provider. Deploy only when your scores go up.
Quick start
npm install @phasio/sdknpm install -D ts-node typescriptnpx phasioStart testing your prompts in minutes. No credit card required.