Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models

This paper identifies "overthinking"—the propagation of incorrect intermediate hypotheses across decoder layers—as a primary cause of hallucinations in Vision Language Models and introduces the Overthinking Score, a layer-probing metric that significantly outperforms existing final-output-based detectors.

Abin Shoby, Ta Duc Huy, Tuan Dung Nguyen, Minh Khoi Ho, Qi Chen, Anton van den Hengel, Phi Le Nguyen, Johan W. Verjans, Vu Minh Hieu Phan

Published 2026-03-10
📖 5 min read🧠 Deep dive

Here is an explanation of the paper "Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models" using simple language and creative analogies.

The Big Problem: The "Confident Liar"

Imagine you have a robot artist who looks at a photo and describes what it sees. Sometimes, this robot gets it right. But sometimes, it confidently describes things that aren't there.

  • The Photo: A kitchen with a sink and a bar of soap.
  • The Robot's Lie: "I see a dish on the counter." (There is no dish, only soap and a sink).

This is called hallucination. For a long time, scientists tried to catch these liars by looking at the robot's final answer. They thought: "If the robot is unsure, it's probably lying. If it's confident, it's telling the truth."

The paper's big discovery: This logic is wrong. The robot can be 100% confident while lying. The truth isn't in the final answer; it's in the messy thought process the robot had before it gave the answer.


The Core Concept: "Overthinking"

The authors found that when the robot hallucinates, it doesn't just "guess wrong." It overthinks.

Think of the robot's brain like a committee of 30 people (layers) trying to decide what object is in the picture.

  1. Normal Thinking (Stable): The committee discusses the image. Person 1 says "Cat." Person 2 says "Cat." Person 30 says "Cat." They all agree quickly. The answer is Cat.
  2. Overthinking (Hallucination): The committee starts arguing.
    • Person 1 says: "Maybe it's a Sink?"
    • Person 2 says: "No, wait, maybe a Soap?"
    • Person 3 says: "Oh, if there's a sink and soap, there must be a Dish!"
    • Person 4 says: "Yes! A Dish!"
    • ...and so on, until Person 30 confidently shouts "DISH!"

Even though there is no dish in the photo, the robot got stuck in a loop of associative thinking. It saw "Sink" and "Soap," and its brain forced a "Dish" into existence because those things usually go together.

The paper calls this "Confounder Propagation."

  • The Confounder: The "Sink" and "Soap" are real, but they are "confounders" because they trick the robot into imagining a third thing that isn't there.
  • The Propagation: The idea of the "Dish" starts as a tiny whisper in the early layers of the brain and gets louder and louder as it moves through the layers, until it becomes a shout in the final output.

Why Old Methods Failed

Previous methods tried to catch the liar in two ways, and both failed:

  1. The "Attention" Method: This method asked, "Is the robot looking at the right part of the picture?"
    • The Flaw: The robot was actually looking at the sink and soap (the real objects) very hard! It just used that focus to invent the dish. So, the "Attention" method thought the robot was telling the truth.
  2. The "Confidence" Method: This method asked, "Is the robot unsure?"
    • The Flaw: By the time the robot finished its "overthinking" loop, it was very sure of its lie. It wasn't confused; it was confidently wrong.

The New Solution: The "Overthinking Score"

The authors created a new tool called the Overthinking Score. Instead of looking at the final answer, they peeked inside the robot's brain at every single step of the thinking process.

They asked two questions:

  1. How many different ideas did the robot consider? (Did it jump from "Sink" to "Soap" to "Dish" to "Bowl"?)
  2. How shaky was its confidence? (Did it flip-flop between ideas?)

The Analogy:
Imagine a detective trying to solve a crime.

  • The Old Way: The detective asks the suspect, "Did you do it?" If the suspect says "No" with a straight face, the detective believes them.
  • The New Way (Overthinking Score): The detective watches the suspect's internal monologue before they speak.
    • Suspect's internal thought: "Wait, I didn't do it... but maybe I did? No, but if I didn't, who did? Maybe I did? No, wait, I was at the store... but the store is far away... maybe I did it?"
    • The Score: The detective sees the suspect is waffling and jumping between stories. Even though the final answer is "No," the internal chaos reveals the lie.

The Results

By using this "Overthinking Score," the researchers could catch the hallucinations much better than before.

  • They found that when the robot's brain was "noisy" (jumping between many different object ideas), it was almost always about to lie.
  • They tested this on popular AI models (like LLaVA and Qwen) and found it worked significantly better than previous methods, catching about 79% of the lies.

Summary

  • The Problem: AI models lie confidently by getting stuck in a loop of "what if" scenarios (Overthinking).
  • The Mistake: Old detectors only looked at the final answer or how much the AI "looked" at the image.
  • The Fix: Look at the journey of the thought. If the AI's brain is jumping between too many different ideas before settling on an answer, it's likely hallucinating.
  • The Takeaway: To catch a liar, don't just listen to what they say; watch how they think. If they are overthinking, they are probably making things up.