The Stepwise Informativeness Assumption: Why are Entropy Dynamics and Reasoning Correlated in LLMs?

The Big Mystery: Why Does "Confusion" Mean "Thinking"?

Imagine you are watching a detective solve a crime in a movie.

The Bad Detective: He guesses wildly, gets confused, changes his mind constantly, and ends up with a wrong answer. His internal monologue is chaotic and full of "maybe this, maybe that."
The Good Detective: He starts with many possibilities, but with every clue he finds, he crosses off the wrong ones. His internal monologue becomes quieter and more focused. By the end, he is 100% sure of the answer.

In the world of Large Language Models (LLMs), researchers have noticed a strange pattern: When a model's "internal confusion" (called Entropy) goes down, it usually means it's getting the right answer.

But here is the puzzle: The model doesn't know the "right answer" while it is thinking. It only knows what words it has written so far. So, why does its internal feeling of "I'm getting closer" match the external reality of "I'm right"?

The Solution: The "Stepwise Informativeness Assumption" (SIA)

The authors of this paper propose a simple rule to explain this magic. They call it the Stepwise Informativeness Assumption (SIA).

Think of reasoning like hiking down a mountain to find a hidden treasure.

The Map (The Model): The model is the hiker.
The Treasure (The Answer): The correct answer is at the bottom of the mountain.
The Fog (Entropy): The fog represents how confused the hiker is about where the treasure is. High fog = lost. Low fog = close.

The SIA Rule says:

"Every step the hiker takes (every word the model writes) should, on average, clear away a little bit of the fog and point toward the treasure."

If the hiker is taking steps that don't clear the fog (like walking in circles or heading the wrong way), they are hallucinating or "overthinking." But if the hiker is taking steps that do clear the fog, they are reasoning correctly.

How Do Models Learn to Do This?

You might ask, "Do models naturally know how to clear the fog?"

No. The paper explains that models learn this through training, much like a student learning to study for a test.

Pre-training (Reading the Library): The model reads millions of books. It learns to predict the next word. Sometimes it learns to solve math problems, but mostly it just learns to sound like a human. At this stage, it's like a student who has read a lot but hasn't been taught how to solve a specific problem. They might guess the answer, but their "fog" doesn't necessarily clear up in a logical way.
Fine-Tuning (The Tutor): Then, humans show the model examples of problems with the correct step-by-step solutions. The model is rewarded for following the path that leads to the right answer.
- The Analogy: Imagine a tutor telling the student, "Don't just guess! Look at the clues you found in step 1; they tell you exactly what to do in step 2."
- The model learns that to get the reward (the correct answer), it must write steps that accumulate information. It learns that every sentence it writes should make the final answer more obvious.
Reinforcement Learning (The Coach): Finally, the model practices on its own. If it gets the answer right, the coach gives a high-five. If it gets it wrong, the coach says "try again." This reinforces the habit of clearing the fog step-by-step.

The "Signatures" of a Good Thinker

The paper found that when a model has learned this "SIA" habit, its internal "fog" behaves in very specific ways that we can measure:

Early Lock-in: A smart model clears the fog early. It figures out the direction quickly. A confused model keeps the fog thick for a long time, wandering around.
The Plateau: Once a smart model finds the treasure, the fog disappears completely (it hits zero). A confused model might get stuck in a "foggy valley" where the fog stops getting thinner, but it still hasn't found the treasure.
The Shuffle Test: The researchers tried to scramble the order of the words in a "good" reasoning chain. Suddenly, the magic disappeared! The model looked confused again. This proves that the order of the words matters. The steps must build on each other like a ladder, not just be a pile of bricks.

Why Should We Care?

This isn't just about math puzzles. This discovery gives us a flashlight to see inside the "black box" of AI.

Detecting Hallucinations: If an AI is writing a long story but its internal "fog" isn't getting thinner, we know it's making things up, even if the words sound fancy.
Stopping Early: If the fog has cleared completely, we can tell the AI, "Okay, you've got it, stop talking!" This saves money and time.
Building Better AI: We now know that to make smarter AI, we shouldn't just feed them more data; we need to train them specifically to write steps that accumulate information about the answer.

Summary

The paper solves the mystery of why AI "confidence" matches "correctness." It turns out that when we train AI well, we teach it a simple rule: "Every word you write should make the answer a little bit clearer." When this rule is followed, the model's internal confusion drops exactly when it gets the answer right.

1. Problem Statement

Recent empirical studies have established a robust correlation between internal entropy dynamics (uncertainty within a model's predictive distribution) and external correctness (alignment with ground-truth answers) in Large Language Models (LLMs) performing reasoning tasks.

The Puzzle: Internal entropy is defined purely by the model's own probability distribution ( $p_\theta$ ), while correctness is defined by an external ground-truth distribution ( $p^\star$ ). There is no theoretical guarantee that a model's internal uncertainty reduction should correlate with the accumulation of correct information.
The Gap: While entropy-based signals are successfully used for early stopping, hallucination detection, and reasoning control, the structural reason for this alignment remains unexplained. Existing justifications (e.g., "entropy narrows the solution space") are often circular or lack formal derivation from training objectives.

2. Methodology and Theoretical Framework

The authors propose a theoretical framework centered on the Stepwise Informativeness Assumption (SIA) to explain this phenomenon.

2.1. Core Definitions

Pointwise Information Gain ( $\Delta_k$ ): The reduction in surprisal of the true answer $A$ $A$ given a reasoning prefix $C_{1:k}$ $C_{1 : k}$ after observing token $C_k$ $C_{k}$ .
- $\Delta_k = h(A | Q, C_{<k}) - h(A | Q, C_{\le k})$
- Positive $\Delta_k$ implies the step is informative; negative implies it is misleading.
Cumulative Gain ( $G_k$ ): The sum of information gains up to step $k$ , representing the total mutual information between the prefix and the answer.
Conditional Answer Entropy: $H(A | Q, C_{1:k})$ . Under SIA, this is interpreted as a progress variable tracking the accumulation of answer-relevant information.

2.2. The Stepwise Informativeness Assumption (SIA)

Assumption: Reasoning prefixes accumulate information about the true answer in expectation.
$I_p(A; C_{1:k} | Q) \ge \epsilon_k > 0$
This implies that as the reasoning chain progresses, the prefix $C_{1:k}$ becomes increasingly informative about the ground-truth answer $A$ .

2.3. Theoretical Derivations

The paper provides rigorous proofs linking SIA to observable entropy dynamics:

Entropy as a Progress Variable: Under SIA, conditional answer entropy is not just internal uncertainty but a direct measure of information gain:
$H(A | Q, C_{1:k}) = H(A | Q) - I(A; C_{1:k} | Q)$
Entropy Constrains Accuracy (Theorem 1): Using Fano's inequality, the authors prove that the error rate of any predictor is lower-bounded by the conditional entropy. Therefore, a reasoning chain cannot be reliably correct unless its prefixes exhibit sufficiently low conditional entropy.
Transfer via Maximum Likelihood Estimation (MLE):
- Human Traces: Human reasoning traces naturally exhibit SIA due to cognitive constraints (minimizing predictive information).
- Model Training: Through MLE (Pretraining, SFT, and RL), the model minimizes the KL-divergence between its distribution and the data distribution.
- Continuity: Using continuity lemmas for entropy and mutual information, the authors prove that if the data distribution satisfies SIA, a model trained via MLE will inherit an internal version of SIA, provided the KL-divergence is small.
- Role of Training Stages: SFT and Reinforcement Learning (RL) explicitly tie prefixes to the correct answer, strengthening SIA compared to base pretraining alone.

3. Key Contributions

Formalization of SIA: The paper introduces SIA as a minimal, falsifiable information-theoretic condition that explains why internal entropy correlates with external correctness.
Theoretical Justification: It derives the link between training objectives (MLE) and the emergence of SIA, showing that standard training pipelines (SFT, RL) induce models to accumulate answer-relevant information step-by-step.
Observable Signatures: The theory predicts specific entropy dynamics for correct vs. incorrect traces:
- Early Lock-in: Correct traces accumulate information (reduce entropy) earlier than incorrect ones.
- Separability: Entropy can distinguish correct from incorrect traces early in the generation process.
- Saturation: Correct traces reach a plateau near zero entropy; incorrect traces may plateau at non-zero entropy or rebound.
Failure Modes: The framework explains "hallucinations" as trajectories where internal entropy decreases (internal confidence increases) but the prefix fails to be informative about the true answer (SIA violation).

4. Empirical Results

The authors validated the framework across 11 models (including Gemma-2, LLaMA-3.2, Qwen-2.5, DeepSeek, and Olmo variants) and 3 datasets (GSM8K, ARC, SVAMP).

SIA Alignment Coefficient ( $\rho_{SIA}$ ): They measured the correlation between conditional answer entropy and the "gold surprisal" (negative log-probability of the true answer).
- Base Models: Showed weak or negative alignment (entropy descent does not track truth).
- SFT Models: Showed strong positive alignment.
- RL-Trained Models (e.g., DeepSeek-R1): Showed near-perfect alignment.
- Conclusion: Truth-directed entropy descent is a training-induced structural feature, not a generic property of autoregressive models.
Signature Verification:
- Early Information Accumulation: Correct traces in aligned models accumulated a larger fraction of total information early in the generation (Figure 1).
- Separability: The AUC for distinguishing correct/incorrect traces using entropy was high early in the generation for aligned models but low for non-aligned models.
- Saturation: Aligned models converged to near-zero entropy; non-aligned models stabilized at higher entropy or exhibited rebounds.
Ablation Studies:
- Shuffle-Prefix: Randomly permuting tokens in the prefix destroyed the alignment, proving that the correlation relies on the structured accumulation of information, not just token count.
- Monte Carlo Robustness: Results remained consistent across different sampling parameters.

5. Significance and Implications

Theoretical Clarity: The paper moves the field from empirical observation to theoretical understanding, explaining why entropy-based heuristics work.
Diagnostic Tool: It provides a rigorous basis for using entropy as a real-time diagnostic for reasoning quality, hallucination detection, and early stopping.
Training Insights: It highlights that SFT and RL are crucial for aligning internal uncertainty with external truth. Models trained only on pretraining data may generate coherent text that is internally confident but factually ungrounded (SIA violation).
Limitations & Future Work: The theory applies best to tasks with well-defined terminal variables (math, multiple-choice). It suggests future work should explore how to generalize these diagnostics to free-form generation and whether modifying entropy dynamics can actively improve reasoning outcomes.

In summary, the paper argues that reasoning is the process of accumulating answer-relevant information, and entropy reduction is the observable signature of this accumulation when the model has been trained to align its internal predictive distribution with the ground-truth answer distribution.