Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information

The Big Idea: The "Efficient Note-Taker"

Imagine you have a student who is incredibly smart but has one obsession: they want to write the shortest possible summary of a textbook. They don't care if the textbook is right or wrong; they only care if they can write the summary using the fewest words possible.

This paper asks a simple question: If the textbook contains a mix of correct facts and fake facts, will this student learn the truth?

The answer, according to the researchers, is: It depends on how "messy" the fake facts are.

If the fake facts are random and chaotic, the student learns the truth because it's easier to summarize.
If the fake facts follow their own perfect, consistent logic (even if it's wrong), the student learns the fake facts just as easily as the truth.

The researchers call this the Compression-Consistency Principle. In short: AI prefers consistency over truth.

The Experiments: Three Scenarios

The researchers tested this by training small AI models on math problems. They created three different "textbooks" (datasets) to see how the AI reacted.

1. The "Random Noise" Scenario (The Truth Wins)

Imagine a math textbook where 50% of the answers are correct, and the other 50% have random, silly mistakes.

Example: "2 + 2 = 4" (Correct) vs. "2 + 2 = 7" (Random error).
The AI's struggle: To summarize the "Random" side, the AI has to memorize every single weird mistake individually. It's like trying to summarize a book where every page has a different, unrelated typo. It's very long and inefficient to write down.
The Result: The AI realizes the "Correct" side is much easier to summarize because it follows one simple rule. Even if there are fewer correct examples, the AI chooses the truth because it's shorter to write.
The Score: The AI picked the correct answer 83% of the time.

2. The "Coherent Lie" Scenario (The Truth Loses)

Now, imagine a textbook where the fake answers aren't random. Instead, they follow a strict, consistent, but wrong rule.

The Rule: "Multiplication always subtracts 1." So, "2 x 2" becomes "3" (because 4 minus 1).
The AI's struggle: This fake rule is actually very easy to summarize! It's just one sentence: "Subtract 1 from the result." It is just as short and neat as the real rule.
The Result: Since both the Truth and the Lie are equally easy to summarize, the AI doesn't care which one is true. It just picks whichever one it saw more often. If the Lie appears 50% of the time, the AI picks the Lie 50% of the time.
The Score: The AI picked the correct answer only 47% of the time (basically guessing).

3. The "Many Little Lies" Scenario (The Middle Ground)

What if the fake textbook has 10 different wrong rules instead of just one?

The Result: The AI has to remember 10 different ways to be wrong. This gets messy again. As the number of different wrong rules increases, the AI starts preferring the single, simple Truth again.
The Score: With 10 different wrong rules, the AI went back to picking the Truth 88% of the time.

The "Verification" Twist

The researchers also tried a trick to see if they could force the AI to spot the "Coherent Lie." They added a "check-up" step to the math problems.

The Setup: The AI solves a problem using the "Lie" rule, but then it has to do a second step (like a reverse calculation) to check its work.
The Result: For the "Lie," this check-up creates a messy, unpredictable number that breaks the pattern. Suddenly, the Lie becomes "hard to summarize" again.
The Score: With this check-up, the AI started picking the Truth again (71%).

However, there was a catch: As the AI got bigger and smarter, it got better at ignoring the check-up and sticking to the simple Lie. This suggests that for very smart AIs, a consistent lie might be very hard to catch without constant, heavy verification.

Why This Matters (The Takeaway)

1. AI isn't a "Truth Seeker," it's a "Pattern Seeker."
We often hope that as AI gets bigger, it naturally becomes more honest. This paper suggests that's not automatic. If a lie is consistent and logical (like a conspiracy theory or a coherent but wrong scientific theory), the AI might actually prefer it because it's "cleaner" to compress than a messy reality.

2. Random errors are easy to fix; Systematic lies are hard to spot.
If an AI hallucinates randomly (saying "Paris is in Germany" today and "Paris is in Mars" tomorrow), it will likely learn the truth because the truth is simpler. But if an AI learns a whole system of lies that makes internal sense, it might never realize it's wrong.

3. The "Compression" Trap.
Think of the AI as a student taking a test.

Random Errors: The student sees a question with a typo. They realize, "That doesn't make sense, I'll ignore it."
Coherent Lies: The student sees a question with a consistent, logical rule that is wrong. They think, "Oh, I see the pattern! I'll use that."

The Bottom Line

Language models are designed to be efficient compressors, not moral guardians. They will happily learn a consistent lie if it's easier to write down than the messy truth. To get them to tell the truth, we can't just rely on them getting "smarter"; we have to make sure the lies they are fed are messy, inconsistent, and hard to compress.

1. Problem Statement

Language models (LLMs) often generate factually incorrect statements despite being trained on vast amounts of data. While scaling, alignment techniques (RLHF), and data statistics explain some aspects of factual accuracy, they do not explain why the fundamental training objective—next-token prediction—would inherently favor truth over falsehood.

The central question is: Does the compression pressure of minimizing cross-entropy loss naturally lead to a preference for correct information, or does it merely favor the most compressible hypothesis, regardless of its truth value?

2. Core Hypothesis: The Compression–Consistency Principle

The authors propose that gradient descent favors hypotheses that yield shorter and more internally consistent descriptions of the training data.

Truth is not fundamental: Truth bias is not an intrinsic property of compression.
The Mechanism: Truth benefits from compression only when false alternatives are structurally incoherent (hard to compress).
- Random Errors: Require memorizing individual exceptions (high description length).
- Coherent Errors: Form a consistent, compact rule system (low description length), compressing just as efficiently as the truth.
Prediction: If a false system is internally consistent, the model should show no preference for the truth over the falsehood, provided frequencies are controlled.

3. Methodology

3.1 Experimental Setup

Models: GPT-2 style decoder-only transformers trained from scratch using MLX.
- Sizes: 3.5M (tiny) to 86M (large) parameters.
- Tokenization: Character-level (vocab size 57) to avoid BPE artifacts.
Training: 5,000 steps, AdamW optimizer, fixed learning rate schedule. All experiments repeated with 4 random seeds.
Corpora: Synthetic mathematical problems (arithmetic, factorization, differentiation) formatted as step-by-step solutions.
- Correct Data: Verified by SymPy.
- Error Types:
  1. Random: Unique, ad-hoc errors injected at random steps (incoherent).
  2. Coherent: A single systematic incorrect rule applied to all problems of a type (e.g., $a \times b = a \times (b-1)$ ).
  3. Contradictory: Simple rules that break algebraic structure (e.g., $a+b = a+b+1$ ).
  4. Multi-Rule: A pool of $N$ different incorrect rules, randomly assigned to problems.

3.2 Evaluation Metrics

Primary Metric: Paired Evaluation.
- For each problem, the model is given a shared prompt and two completions (one correct, one incorrect).
- The model's Negative Log-Likelihood (NLL) is compared only on the completion tokens.
- Metric: Pair Accuracy (fraction of pairs where the model prefers the correct completion). This eliminates confounds from different prompt structures.
Secondary Metric: Corpus-Level Loss.
- Difference in loss between incorrect and correct corpora ( $DLoss = Loss_{inc} - Loss_{corr}$ ). Used as a diagnostic but noted to be sensitive to text statistics.

4. Key Results

4.1 Random vs. Coherent Errors (The Core Finding)

Random Errors (Incoherent): The model exhibits a strong truth bias.
- At 50/50 mix: 83.1% accuracy in preferring correct completions.
- At 10/90 mix (10% correct, 90% incorrect): 66.7% accuracy.
- Interpretation: Random errors are incompressible; the model learns the compact correct rule system.
Coherent Errors (Consistent Falsehood): The truth bias disappears.
- At 50/50 mix: 47.2% accuracy (near chance).
- At 20/80 mix (20% correct, 80% incorrect): The model actively prefers the incorrect coherent system (9.6% accuracy for truth).
- Interpretation: When the false system is internally consistent and compresses as well as the truth, the model follows frequency or slight structural advantages, not truth.

4.2 The Role of Observations and Corrections

Adding Observations: Introducing empirical checks (e.g., "counted 50 items") did not restore truth bias for coherent errors. The model learned the regularity of the discrepancy (e.g., "always off by $a$ ") as a new rule, keeping the description length low.
Ad Hoc Corrections: Even when unique explanations were added for every error (making them incompressible), the model failed to transfer this "truth preference" to pure mathematical pairs without the observation context.
Conclusion: Truth bias is driven by the incoherence of the error, not by the presence of verification data in the training set.

4.3 Multi-Rule Errors (The Graded Boundary)

The authors tested a spectrum between "one coherent rule" and "random noise" by introducing $N$ alternative wrong rules.

N=1 (Coherent): ~46.6% accuracy.
N=2: ~77.6% accuracy.
N=10: ~88.3% accuracy.
Result: There is a graded transition. As the diversity of false rules increases (increasing the description length of the false system), the model's preference for the truth increases. The largest jump occurs between 1 and 2 rules.

4.4 Scaling and Chained Verification

Scaling (Fixed Steps):
- Random Errors: Truth bias increases slightly with model size (83.1% $\to$ 89.1% from 3.5M to 86M).
- Coherent Errors: Truth bias remains near chance (47%–53%) across all sizes.
Chained Verification: Embedding a verification step within the task (e.g., solving an equation and then back-substituting) transforms a coherent error into an incompressible numerical residual.
- This restored truth bias to 70.9% for the tiny model.
- However, this effect showed a declining trend in larger models under fixed-step training, suggesting larger models may learn the coherent pattern more easily than the verification signal.

4.5 Natural Language Domain

In a synthetic natural language world, truth bias was significantly weaker (57.7%) compared to mathematics (83.1%). Natural language appears to absorb contradictions more easily, making errors more compressible.

5. Key Contributions

Controlled Experimental Design: Introduced a "coherent-false" condition as a strong null hypothesis, isolating compressibility from truth value.
Paired Evaluation: Demonstrated that corpus-level loss metrics can be misleading (overestimating truth bias due to frequency effects) and established paired evaluation as the necessary standard for measuring truth preference.
Negative Result: Proved that compression pressure alone does not align with correctness; it aligns with consistency. A coherent false system is indistinguishable from truth to a compressor.
Graded Boundary: Identified that truth bias is not binary but scales with the "incompressibility" of the false alternatives (number of conflicting rules).

6. Significance and Implications

For Alignment: The training objective (next-token prediction) does not provide a "truth compass." It is a "consistency compass." Systematic falsehoods that are internally coherent can remain competitive with truth, potentially explaining why models hallucinate confidently on consistent but wrong narratives.
For ML Epistemology: Internal truth representations may emerge not because the model "knows" truth, but because true statements are often the most compressible hypothesis in a noisy world.
For Hallucinations: Coherent misconceptions are dangerous because they are not filtered out by compression pressure. They only become detectable when they create incompressible contradictions (e.g., via cross-domain verification).
Limitations: The study is limited to small models (up to 86M) and synthetic domains. The behavior of larger models on real-world, cross-domain data remains an open empirical question.

Conclusion

The paper concludes that compression favors consistency, not truth. Language models prefer correct information primarily because random errors are structurally incoherent and hard to compress. When falsehoods are internally consistent and compact, the model loses its preference for truth, treating the false system as a valid alternative. This suggests that ensuring factual accuracy in LLMs requires more than just scaling; it requires mechanisms that break the compressibility of coherent falsehoods (e.g., dense cross-domain verification).