Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

This paper proposes novel prompt-based techniques grounded in imprecise probabilities to effectively elicit and quantify both first-order and second-order uncertainty from large language models, thereby addressing systematic failures in ambiguous or complex settings and improving the credibility of their uncertainty reporting.

Anita Yang, Krikamol Muandet, Michele Caprio, Siu Lun Chau, Masaki Adachi

Published 2026-03-12
📖 6 min read🧠 Deep dive

Imagine you are asking a very smart, well-read robot (a Large Language Model, or LLM) a question. You want to know not just the answer, but how sure the robot is about that answer.

Currently, if you ask the robot, "How confident are you?" it usually gives you a single number, like "I'm 80% sure." The authors of this paper argue that this single number is often a lie, or at least a very poor description of reality. It's like a weather forecaster saying, "There is an 80% chance of rain," without telling you if they are looking at a clear sky or a hurricane.

Here is the core idea of the paper, broken down with simple analogies.

1. The Problem: The "Single Number" Trap

The paper identifies three situations where asking for a single "confidence score" fails:

  • The Ambiguous Question: Imagine asking, "Who hosted the 2019 Cricket World Cup?"
    • The Robot's Dilemma: The answer is technically "England and Wales" (co-hosts). But if you ask a human, they might say "England" or "Wales" or "The UK."
    • The Failure: A standard robot might say, "I'm 90% sure the answer is England." This is misleading. The robot isn't 90% sure; it's actually confused because the question itself is fuzzy. It can't give a single number that captures this confusion.
  • The "Learning" Scenario (In-Context Learning): Imagine you give the robot 10 examples of a math puzzle, then ask it to solve a new one.
    • The Failure: As you give it more examples, the robot gets better at solving the puzzle. Its actual error rate drops. But if you ask for its confidence, it often stays stuck at a high "uncertainty" level. It doesn't realize it has learned the pattern yet.
  • The Self-Reflection Trap: Imagine the robot picks an answer and then explains why it picked it.
    • The Failure: Often, the robot's explanation doesn't match its confidence score. It might say, "I'm 99% sure this is right," but then give a weak, shaky reason. The numbers and the logic don't line up.

2. The Solution: The "Fuzzy Interval" (Imprecise Probabilities)

The authors propose a new way to talk to the robot. Instead of asking for a single number (a precise point), they ask the robot to give a range (an interval).

Think of it like this:

  • Old Way (Precise Probability): "I think the temperature is exactly 72°F." (This feels confident, but might be wrong).
  • New Way (Imprecise Probability): "I think the temperature is somewhere between 65°F and 80°F."

This range tells you two different things at once:

  1. First-Order Uncertainty (The "What"): How spread out are the possible answers? (e.g., "Is it 65 or 80?"). This is the natural randomness of the question.
  2. Second-Order Uncertainty (The "How Sure I Am"): How wide is that range?
    • If the range is narrow (68°F to 72°F), the robot is confident in its knowledge. It knows the answer well.
    • If the range is wide (50°F to 90°F), the robot is admitting ignorance. It doesn't know enough to narrow it down.

3. The Creative Analogy: The Detective and the Witness

Imagine a detective (the Robot) trying to solve a crime.

  • The Old Method: You ask the detective, "How sure are you that John did it?"

    • The detective says, "80% sure."
    • Problem: You don't know why he is 80%. Is he 80% sure because the evidence is shaky? Or is he 80% sure because the question is confusing? You can't tell.
  • The New Method (Imprecise Probabilities): You ask, "Give me a range of how likely it is that John did it."

    • Scenario A (Ambiguous Question): The detective says, "It could be anywhere from 10% to 90%."
      • Meaning: "The question is so vague (maybe 'John' refers to two different people) that I can't even narrow it down. I am uncertain about my own uncertainty."
    • Scenario B (Learning from Clues): You give the detective more clues.
      • Result: The range shrinks. "Now I'm 70% to 85% sure."
      • Meaning: "I have learned enough to narrow down the possibilities. My ignorance has decreased."

4. How They Did It (The "Magic Prompt")

The researchers didn't change the robot's brain (which is often a secret "black box"). Instead, they changed the questions they asked.

They used a clever prompting technique based on an old idea from a mathematician named Bruno de Finetti. They asked the robot to act like a gambler.

  • The Prompt: "If you had to bet money on this answer being correct, what is the lowest price you would pay to buy the bet, and what is the highest price you would accept to sell it?"
  • The Result:
    • If the robot is confused, it will give a huge gap between the buy and sell price (e.g., "I'd buy at $0.10 but sell at $0.90"). This wide gap represents high second-order uncertainty (ignorance).
    • If the robot is sure, the gap will be tiny (e.g., "Buy at $0.85, Sell at $0.86"). This narrow gap represents low second-order uncertainty (knowledge).

5. Why This Matters

This method makes AI more honest and useful.

  • Better Decision Making: If a doctor asks an AI, "Is this tumor cancerous?" and the AI says, "I'm 90% sure," the doctor might operate. But if the AI says, "I'm 90% sure, but my confidence range is 10% to 99% because the scan is blurry," the doctor knows to get a second opinion.
  • Cost Effective: The paper shows this doesn't require expensive, complex computing. It just requires asking the right questions.
  • Fixing "Hallucinations": It helps the AI realize when it is making things up because it doesn't have enough information to narrow down its answer range.

Summary

The paper teaches us that uncertainty isn't just one thing. Sometimes we are unsure because the world is chaotic (First-Order). Sometimes we are unsure because we don't know enough (Second-Order).

By asking AI to give a range instead of a single number, we get a much clearer picture of what the AI actually knows, what it is guessing, and when it is simply confused. It turns the AI from a "confident liar" into a "humble expert."