Never Cry Wolf: Why Failures Default to Pass
Every LLM failure in the pipeline produces a "pass" verdict. Here's why false negatives are cheaper than false positives in a supervision tool.
The Error Budget Question
When the LLM times out, returns garbage, or the server crashes mid-inference — what should the evaluator do? There are only two options: assume something is wrong (violation), or assume everything is fine (pass).
We choose pass. Every time. Here's why.
The Cost Asymmetry
| Failure Mode | Cost | Recovery |
|---|---|---|
| False positive (cry wolf) | User gets alert on nothing. Trust erodes. Tool gets disabled. | None — trust is hard to rebuild |
| False negative (missed alert) | Real problem goes undetected for one cycle (5 turns). | Next evaluation cycle catches it |
A false negative costs ~30 seconds of delay. A false positive costs your relationship with the user. In a monitoring tool, credibility is everything.
The Pattern Across the System
The Confidence Signal
Notice the fail-safe returns confidence: 0.3, not 0.9. This is deliberate:
try {
const result = await this.llm.chatJSON<EvalResponse[]>(messages, {
temperature: 0.6,
});
return this.parseEvalResults(result, rules);
} catch (error) {
// Never produce a false positive from infrastructure failure
return rules.map(rule => ({
ruleId: rule.id,
verdict: 'pass' as const,
confidence: 0.3, // Low confidence signals "I didn't really evaluate this"
reason: 'Evaluation skipped due to inference error',
}));
}Downstream components can distinguish between a genuine "I evaluated this and it's fine" (confidence: 0.85) and a "the LLM was down so I defaulted" (confidence: 0.3). If you wanted to build a system health dashboard, low-confidence passes would flag inference reliability issues.
When You'd Flip This
This error budget only works for monitoring tools. In other domains, the calculus flips:
| Domain | On Failure, Default To... | Why |
|---|---|---|
| AI agent supervision | Pass (safe) | User trust > delayed detection |
| Medical diagnosis | Flag for review | Missed diagnosis > false alarm |
| Financial fraud | Block transaction | Fraud loss > false decline |
| Content moderation | Flag for review | Harmful content > over-moderation |
The principle
Choose your failure mode before writing a single line of code. Ask: which is cheaper — a false positive or a false negative? Then make every catch block in the system enforce that choice consistently. In Singularity, the answer is unambiguous: never cry wolf.