Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores

This paper proposes a lightweight, single-pass uncertainty estimation method for large language models that leverages intra-layer local information scores to achieve robust, transferable performance across different datasets and quantization levels, outperforming existing probing techniques while revealing insights into how models encode uncertainty.

Zvi N. Badash, Yonatan Belinkov, Moti Freiman

Published 2026-03-25
📖 4 min read☕ Coffee break read

The Big Problem: The Confident Liar

Imagine you ask a very smart, well-read friend (an AI) a question. They answer instantly, with perfect grammar and total confidence. But, they are completely wrong. This is called a hallucination.

Current AI models are great at sounding confident, even when they are guessing. We need a way to tell when the AI is "sure" and when it's just "bluffing." This is called Uncertainty Estimation.

The Old Ways: Why They Fail

The paper looks at how we usually try to catch these lies:

  1. The "Output" Method: We look at the AI's final answer. If the AI says, "I'm 99% sure this is the capital of France," we trust it.
    • The Flaw: Sometimes the AI is 99% sure it's Paris, but it's actually London. The AI can be confidently wrong. It's like a con artist who speaks so smoothly you believe them.
  2. The "Internal Probe" Method: We try to peek inside the AI's brain (its hidden layers) to see if it's nervous.
    • The Flaw: The AI's brain is huge and messy. It's like trying to find a specific thought in a library with millions of books, all written in a secret code. It's hard to do, and if you move to a new topic (like switching from history to math), the method often breaks.

The New Solution: The "Team Meeting" Analogy

The authors propose a clever new way to check the AI's confidence. Instead of looking at the final answer or the messy whole brain, they look at how the different parts of the AI's brain agree with each other.

Imagine the AI is a company with 100 managers (layers) sitting in a row.

  • Manager 1 reads the question.
  • Manager 2 passes a note to Manager 3, and so on, until Manager 100 writes the final answer.

Usually, if the answer is correct, all the managers are on the same page. They pass notes that flow smoothly.
But if the AI is hallucinating (making things up), the managers start arguing.

  • Manager 10 thinks the answer is "Apple."
  • Manager 50 thinks it's "Orange."
  • Manager 90 is confused.

The Paper's Method:
The authors measure the "disagreement" between every pair of managers. They create a Scorecard (a grid) that shows how much Manager A disagrees with Manager B.

  • Low Disagreement (Smooth Flow): The team is united. The AI is likely correct.
  • High Disagreement (Chaos): The team is fighting. The AI is likely lying or guessing.

Why This is Special

  1. It's Compact: Instead of reading the whole library of books (the massive internal data), they just look at the Scorecard. It's a tiny, simple summary of the team's mood.
  2. It's Fast: They can check this scorecard in a single pass. No need to run the AI twice or ask it the same question ten times.
  3. It Travels Well: This is the biggest win. If you train the system to spot "team arguments" on a History test, it works great on a Math test too. The old methods (probing) usually fail when you switch subjects, but this "Team Agreement" method works everywhere.
  4. It Survives Compression: Even if you shrink the AI to make it run on a phone (quantization), this method still works. It's robust.

The Result

The researchers tested this on three different giant AI models. They found:

  • When the AI is in its "home turf" (familiar data), this new method is just as good as the old, complicated methods.
  • When the AI is in a new situation (different data or tasks), this new method is much better at spotting the lies.
  • It gives a better "confidence score," meaning we can trust the AI's "I'm not sure" warnings more.

The Takeaway

The paper suggests that truth is found in the agreement between the layers. By listening to how the different parts of the AI's brain talk to each other, we can tell if the AI is telling the truth or just making noise. It's a lightweight, fast, and reliable way to stop the AI from confidently lying to us.