Self-Consistency in LLMs: The Silent Effect That Degrades Long Sessions
Have you ever noticed that after a long conversation with an AI it gets "dumber"? Or starts making more errors?
It's not black magic. It has a name: Self-Consistency.
The problem
You start a simple, repetitive task with an AI — adding prices, classifying reviews, applying a rule to many rows — and at first it's perfect. After several minutes, errors appear. You correct one, and soon there are more. It feels like it degrades even though the task isn't difficult.
What self-consistency is
Self-consistency describes how the model influences itself with its own history. In long-horizon tasks, both correct answers and errors stay in the context window.
The model tries to be consistent with what it already "said." If there's an error in the context, it tends to align with that past and, unintentionally, reinforces it.
The mechanics:
- High precision at the start — almost 100% in the first steps
- Eventually an error surfaces
- That error stays in the context
- The model "looks back" and adjusts its belief
- Degradation loop → the probability of the next error increases
A concrete example
Task: "Extract prices and keep a running total."
- Steps 1–12: perfect
- Step 13: reads
$1,090as$1,900(small error) - Step 14+: uses the contaminated total as the base
Minutes later, two truths are circulating: yours and the chat's.
If the precision per step is 95%, the chance of 50 perfect steps is ~7.7%. The error is expected; the trick is making sure it doesn't contaminate everything.
Solutions
Chunking: process in short batches and validate externally.
Recalculate from scratch: in each batch, don't use previous totals.
Context reset: if an error appears, start a new thread with the canonical dataset.
Useful prompt:
Process 20 at a time. Recalculate from zero using only these 20.
Validate with these rules. If it fails, mark needs_review and don't correct the historical data.
TL;DR
Self-consistency causes past errors to drag the model toward more errors when the session is long. Context reinforces its own narrative. The antidote: short batches, recalculate from scratch, resets when something gets contaminated, and a rigid output format.