Parse Liberally, Match Fuzzy
LLMs return approximately correct output. Your parsing layer needs to handle "close enough" — because prompt engineering alone won't get you to 100%.
The 90% Problem
You can add "return the EXACT rule ID" to your prompt. The LLM will comply 90% of the time. The other 10% silently drops evaluations — the LLM does the work, returns a verdict, but you can't match it back to a rule because it wrote debug-loop instead of rule-debug-loop.
The Fix: Three-Way Matching
const match = evalResults.find(r =>
r.ruleId === rule.id || // exact match
`rule-${r.ruleId}` === rule.id || // LLM dropped prefix
r.ruleId === rule.id.replace('rule-', '') // reverse check
);The Same Pattern Everywhere
This "parse liberally" principle shows up across the entire LLM layer:
Qwen3 leaks reasoning tokens
// Strip <think> blocks before parsing
content.replace(/<think>[\s\S]*?<\/think>\s*/g, '').trim()Models wrap JSON in markdown fences
// Strip markdown code fences
content.replace(/^\`\`\`(?:json)?\s*\n?([\s\S]*?)\n?\s*\`\`\`$/g, '$1').trim()Qwen3 wraps arrays in objects
// Handle Qwen3's array-wrapping quirk
if (Array.isArray(response)) {
evalResults = response;
} else if (typeof response === 'object' && response !== null) {
// Dig out the first array property
const arrayProp = Object.values(response).find(v => Array.isArray(v));
evalResults = arrayProp || [response];
}The principle
With traditional APIs, a contract violation is a bug you file. With LLMs, approximate compliance is the norm. Your parsing layer is your real contract enforcement. Build tolerance into your parsing, not just your prompts — because prompts get you to 90%, and the last 10% is where the bugs hide.