v0.4.4 — multi-judge scoring

Test your prompts before they break production

The Jest for LLM prompt testing. Write test cases, run evals across OpenAI and Anthropic, and catch regressions automatically in CI.

Start for free →
$npm install @phasio/sdk

No credit card required · Works with OpenAI + Anthropic

Works in your CI/CD pipeline

One config file. Write tests like Jest. Run with npx phasio.

phasio.config.ts
// phasio.config.ts
import { defineConfig } from '@phasio/sdk';

export default defineConfig({
  apiKey: process.env.PHASIO_API_KEY,
  providers: [
    { provider: 'openai',    llmKey: process.env.OPENAI_API_KEY,    model: 'gpt-4o-mini' },
    { provider: 'anthropic', llmKey: process.env.ANTHROPIC_API_KEY, model: 'claude-haiku-4-5-20251001' },
  ],
  versions: [
    { label: 'v1', template: 'Summarize briefly: {{input}}' },
    { label: 'v2', template: 'One sentence summary of: {{input}}' },
  ],
  failOnThreshold: 80,
  exitOnFail: true,
});
phasio/summarizer.test.ts
// phasio/summarizer.test.ts
import { describe, pe, contains, llmJudge } from '@phasio/sdk';

describe('Summarizer', () => {
  pe.test('includes key terms', {
    input: 'The quick brown fox jumped over the lazy dog.',
    expect: contains('fox'),
  });

  pe.test('quality check', {
    input: 'Explain what an API is.',
    expect: llmJudge('Clear, concise, suitable for a beginner'),
  });
});
terminal — npx phasio
Phasio v0.4.4
Running 1 test file...

▶ phasio/summarizer.test.ts

Summarizer
────────────────────────────────────────────────
  ✓ includes key terms   openai/v1 100% 820ms   openai/v2 100% 743ms   2.8s
  ✓ quality check        openai/v1 100% 1.2s    openai/v2 100% 980ms   4.2s

════════════════════════════════════════════════
 Test Results
════════════════════════════════════════════════
 Suite            Tests   Results     Time
 ───────────────────────────────────────────────
 ✓ Summarizer     2/2     all passed  7.0s
 ───────────────────────────────────────────────
 Total            2/2     all passed  7.0s
════════════════════════════════════════════════
 ✓ All suites passed
════════════════════════════════════════════════

✓ All test files passed

Everything you need to ship prompts with confidence

From versioning to regression detection — Phasio covers the full prompt lifecycle.

Prompt Versioning

Track every change to your prompts. Compare v1 vs v2 side by side with full eval history.

Jest-like API

describe, pe.test, beforeAll, afterAll — zero learning curve for JS/TS developers.

Multi-Provider

Run the same suite against OpenAI and Anthropic in parallel. See results per provider per version.

LLM Judge

Define natural language criteria. Multiple judges score independently — scores are averaged for unbiased results.

CI/CD Ready

npx phasio finds and runs all *.test.ts files. Exits with code 1 on failure — your pipeline fails automatically.

Analytics Dashboard

Track score trends over time. Sync eval results to phasio.dev with telemetry: true.

How it works

Up and running in under 5 minutes.

01

Create phasio.config.ts

Define your providers, judge providers, prompt versions, and thresholds in one config file at your project root.

02

Write test files

Create *.test.ts files in your phasio/ folder. Use describe, pe.test, and validators — no boilerplate.

03

Run npx phasio

The CLI discovers all test files, injects config, runs evals across all providers and versions, and prints a report.

04

Ship with confidence

See exactly what improved and what regressed per provider. Deploy only when your scores go up.

Quick start

npm install @phasio/sdk
npm install -D ts-node typescript
npx phasio

Ready to stop guessing?

Start testing your prompts in minutes. No credit card required.