CI for Prompts
Questi contenuti non sono ancora disponibili nella tua lingua.
Prerequisites
Section titled “Prerequisites”- Repo with prompts stored as versioned files.
- Test corpus with inputs and expected outputs or acceptance checks.
- Access to model provider and a cost cap for CI.
- Add prompt test runner script that reads fixtures and evaluates acceptance checks.
- Mock or cap external calls with deterministic seeds where possible.
- Configure CI job (e.g., GitHub Actions) to run on PR and on merge.
- Fail the job on:
- Schema violations
- Output drift above approved thresholds
- Increased token cost beyond budget
- Store artifacts: diffs, samples, and run metrics.
Validation
Section titled “Validation”- Green build with stable metrics against baseline.
- Review artifacts and approve intentional changes.
Troubleshooting
Section titled “Troubleshooting”- Flaky tests: tighten determinism (temperature, seeds) or use larger corpora.
- Cost spikes: shard tests or mark some as nightly.
- Provider 429s: implement backoff and retries.
Time/Impact
Section titled “Time/Impact”- Setup: 1–2 hours.
- Ongoing: minutes per PR; prevents regressions and incidents.