Technique4 min2026-03-04

Never Cry Wolf: Why Failures Default to Pass

Every LLM failure in the pipeline produces a "pass" verdict. Here's why false negatives are cheaper than false positives in a supervision tool.

The Error Budget Question

When the LLM times out, returns garbage, or the server crashes mid-inference — what should the evaluator do? There are only two options: assume something is wrong (violation), or assume everything is fine (pass).

We choose pass. Every time. Here's why.

The Cost Asymmetry

Failure Mode	Cost	Recovery
False positive (cry wolf)	User gets alert on nothing. Trust erodes. Tool gets disabled.	None — trust is hard to rebuild
False negative (missed alert)	Real problem goes undetected for one cycle (5 turns).	Next evaluation cycle catches it

A false negative costs ~30 seconds of delay. A false positive costs your relationship with the user. In a monitoring tool, credibility is everything.

The Pattern Across the System

fail-safe behaviors

Component On LLM Failure...

─────────────────────────────────────────────────

Evaluator → pass (confidence: 0.3)

RAG builder → heuristic fallback (threshold splits)

Plan detection → no change detected

Title generation → keep existing title

Explainer → skip silently

Context retrieval → return last 2 phases

Summary generation → skip this cycle

The Confidence Signal

Notice the fail-safe returns confidence: 0.3, not 0.9. This is deliberate:

packages/supervisor/src/evaluator.ts

try {
  const result = await this.llm.chatJSON<EvalResponse[]>(messages, {
    temperature: 0.6,
  });
  return this.parseEvalResults(result, rules);
} catch (error) {
  // Never produce a false positive from infrastructure failure
  return rules.map(rule => ({
    ruleId: rule.id,
    verdict: 'pass' as const,
    confidence: 0.3,  // Low confidence signals "I didn't really evaluate this"
    reason: 'Evaluation skipped due to inference error',
  }));
}

Downstream components can distinguish between a genuine "I evaluated this and it's fine" (confidence: 0.85) and a "the LLM was down so I defaulted" (confidence: 0.3). If you wanted to build a system health dashboard, low-confidence passes would flag inference reliability issues.

When You'd Flip This

This error budget only works for monitoring tools. In other domains, the calculus flips:

Domain	On Failure, Default To...	Why
AI agent supervision	Pass (safe)	User trust > delayed detection
Medical diagnosis	Flag for review	Missed diagnosis > false alarm
Financial fraud	Block transaction	Fraud loss > false decline
Content moderation	Flag for review	Harmful content > over-moderation

→

The principle

Choose your failure mode before writing a single line of code. Ask: which is cheaper — a false positive or a false negative? Then make every catch block in the system enforce that choice consistently. In Singularity, the answer is unambiguous: never cry wolf.