Eliciting Numerical Predictive Distributions of LLMs Without Autoregression

This paper demonstrates that statistical functionals of Large Language Models' numerical predictive distributions, including uncertainty, can be efficiently recovered from internal representations using regression probes, offering a lightweight alternative to computationally expensive autoregressive sampling.

Julianna Piskorz, Katarzyna Kobalczyk, Mihaela van der Schaar

Published 2026-03-04
📖 4 min read☕ Coffee break read

Imagine you have a super-smart, all-knowing librarian (the Large Language Model, or LLM) who has read every book in the world. You ask this librarian to predict the weather for tomorrow.

Usually, when you ask an LLM for a number, it acts like a very slow, meticulous scribe. It doesn't just "know" the number; it has to write it down one letter at a time. If the answer is "1,234.56," the librarian has to think: "Okay, first I'll write '1', then I'll think about '2', then '3'..." It has to generate every single digit sequentially. This is called autoregressive generation.

If you want to know not just what the weather will be, but how sure the librarian is (e.g., "It might be 10 degrees, or maybe 12, or maybe 8"), you have to ask the librarian to write the answer 100 different times to see the spread of possibilities. This is incredibly slow and expensive, like asking a scribe to write the same book 100 times just to check if the spelling is consistent.

The Big Discovery: The "Brain" Knows Before the "Hand" Writes

This paper asks a fascinating question: Does the librarian's brain actually know the answer before the hand starts writing the first letter?

The researchers discovered that yes, it does.

They found that the internal "thoughts" of the LLM (its hidden states) already contain the full picture of the number it intends to generate, including the uncertainty, long before it starts typing out the digits.

The Analogy: The Architect vs. The Bricklayer

Think of the LLM as a construction project:

  • The Autoregressive Process (The Bricklayer): This is the slow part where the machine lays bricks one by one to build a wall. To get a wall 100 feet high, it takes a long time.
  • The Internal Representation (The Architect): This is the blueprint hidden inside the machine's mind. The blueprint already shows the entire wall, its height, its width, and even the probability that a brick might fall off.

The researchers built a special tool called a "Probe" (think of it as a X-ray machine or a decoder ring). Instead of waiting for the bricklayer to finish the wall, they used the X-ray to look at the Architect's blueprint.

How They Did It (The "Magnitude-Factorised" Trick)

Predicting numbers is hard for AI because numbers vary wildly in size. A number like "0.0001" is very different from "1,000,000." If you try to teach a student to guess both, they get confused.

The researchers solved this with a clever two-step strategy, which they call Magnitude-Factorisation:

  1. The Magnitude Classifier (The "Order of Magnitude" Guess): First, the probe asks, "Is the answer in the thousands? The millions? Or is it a tiny decimal?" It guesses the scale of the number.
  2. The Value Regressor (The "Fine-Tuning" Guess): Once it knows the scale, it asks, "Okay, if it's in the thousands, is it 1,200 or 1,800?"

By splitting the problem into "How big is it?" and "What is the exact number?", the probe can accurately predict the answer without the LLM ever having to type a single digit.

What They Found

  1. The Blueprint is Complete: The probe could accurately predict the average answer, the most likely answer, and even the "middle" answer just by looking at the LLM's internal state.
  2. Uncertainty is Visible: The probe could also tell you how confident the LLM is. It could predict the range of possible answers (e.g., "It's likely between 10 and 12") without needing to ask the LLM to generate 100 different samples.
  3. Speed and Cost: Because the probe only needs to look at the blueprint once, it is massively faster than the traditional method. It's like reading the architect's plan in 0.03 seconds versus waiting for the bricklayer to build the wall 100 times (which takes seconds or minutes).

Why This Matters

This is a game-changer for using AI in real life, especially for things like:

  • Financial forecasting: Where you need to know not just the stock price, but the risk.
  • Medical predictions: Where knowing the uncertainty of a diagnosis is as important as the diagnosis itself.
  • Robotics: Where a robot needs to make quick decisions without waiting for a slow computer to "think" through every possibility.

In short: The paper proves that LLMs don't need to "talk" to give you a number. They already "know" the number and how sure they are about it deep inside their neural networks. We just needed to build a better way to listen to that internal thought process without waiting for them to speak out loud.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →