When the LLM Is Wrong, Code Wins
How hard overrides on LLM decisions keep the RAG tree balanced — and the broader principle of LLM-in-the-loop, not LLM-in-control.
The Yes-Man Problem
The RAG index builder asks the LLM: "should these 4 turns extend the current topic, start a new topic, or start a new phase?" The LLM almost always says extend_topic. Every time.
Why? Because consecutive turns in a coding session ARE related. The LLM sees coherence and thinks "same topic." You end up with one monolithic phase spanning 200 turns and one giant topic — useless for retrieval.
The Override
// LLM says "extend_topic" but the topic has 24 turns
if (llmDecision === 'extend_topic' && currentTopic.turns > 20) {
decision = 'new_topic'; // Override: force a split
}
// LLM says "extend_topic" or "new_topic" but the phase has 85 turns
if (decision !== 'new_phase' && currentPhase.turns > 80) {
decision = 'new_phase'; // Override: force a phase break
}How the Thresholds Were Chosen
Empirical tuning, not theory. We ran the system against real 300+ turn sessions and observed the tree output:
| Topic/Phase Limit | Result | Problem |
|---|---|---|
| topic=10, phase=50 | Too many splits | "Writing auth.ts" and "Still writing auth.ts" as separate topics |
| topic=30, phase=100 | Too few splits | Single topic covers write + test + debug + deploy |
| topic=20, phase=80 | Balanced | 3-5 phases, 3-8 topics per phase, good retrieval |
Graceful Degradation
The same thresholds serve as the fallback when the LLM fails entirely:
// LLM failed twice — use heuristic fallback
if (phases.length === 0) return 'new_phase'; // No structure yet
if (currentPhase.turns > 80) return 'new_phase'; // Phase too long
if (currentTopic.turns > 20) return 'new_topic'; // Topic too long
return 'extend_topic'; // Default: continueLLM failure just means you lose custom titles and summaries — you keep correct tree structure. The structural invariants hold whether the LLM cooperates or not.
The Trust Boundary
| Code Owns | LLM Owns |
|---|---|
| When to split (thresholds) | What to name topics/phases |
| Tree structure (Phase > Topic > Action) | Summary content |
| Batch size (4 turns) | Relevance judgment |
| Evaluation cadence | Violation vs. pass decisions |
| Max explanations (100/session) | Explanation text |
The principle
LLM-in-the-loop, not LLM-in-control. Code defines structural invariants — the LLM adds richness. If you swapped Qwen3 for a completely different model, all structural guarantees would still hold. Only the judgment quality and prompt phrasing would need re-tuning. That's the boundary.