Imagine a Large Language Model (LLM) as a very confident, incredibly well-read librarian who has memorized billions of books but has never actually lived in the real world. This librarian can write beautiful, fluent stories, but sometimes, they make things up.
For a long time, we just called all these mistakes "hallucinations." But this paper argues that's like calling a flat tire, a broken engine, and an empty gas tank all "car problems." They look similar from the outside, but they have different causes and require different fixes.
The author, Javier Marín, proposes a new way to categorize these mistakes using geometry (the study of shapes and distances). He imagines all words and ideas existing in a giant, invisible multi-dimensional map. On this map, the distance between two points represents how similar their meanings are.
Here is the simple breakdown of his three types of "hallucinations" and how to catch them:
1. The "Daydreamer" (Type I: Unfaithfulness)
- The Scenario: You give the librarian a specific document (like a meeting agenda) and ask, "What did we decide?" The librarian ignores your paper and answers based on what they remember from their general memory.
- The Geometry: On the map, the answer stays close to your question but drifts far away from the document you gave them.
- The Fix (SGI): The author created a tool called the Semantic Grounding Index (SGI). Think of it as a "magnet test." If the answer is pulled strongly toward the document you provided, it's good. If it floats away and stays close to the question instead, the librarian is "daydreaming" and ignoring the facts you gave them.
2. The "Fiction Writer" (Type II: Confabulation)
- The Scenario: You ask, "Who is the CEO of this fake company I just invented?" The librarian, wanting to be helpful, invents a name, a backstory, and a biography for a person who doesn't exist.
- The Geometry: This is tricky. The librarian isn't ignoring you; they are answering confidently. But on the map, their answer takes a weird, sharp turn into a "no-man's-land" of ideas that don't actually exist in reality. It's like drawing a map of a city that has a bridge leading to nowhere.
- The Fix (Γ): The author created the Directional Grounding Index (Γ). Imagine a compass that knows the "normal direction" of truth. When the librarian invents a fake entity, their answer points in a direction that the compass knows is "off the map." This tool is very good at spotting made-up facts, even if they sound perfectly logical.
3. The "Slightly Wrong Expert" (Type III: Factual Error)
- The Scenario: You ask, "Who was the 16th President of the US?" The librarian says, "Abraham Lincoln." (Correct). But if you ask, "Who was the 17th?" and they say "Ulysses S. Grant" (Correct), but then you ask about a detail like "What was his middle name?" and they get it wrong, that's a Type III error.
- The Geometry: This is the hardest one. The librarian is talking about the right topic, in the right neighborhood of the map. They are just standing on the wrong house number. Because the answer is so close to the truth conceptually, the geometry looks almost identical to a correct answer.
- The Big Discovery: The paper found that you cannot detect this type of error using geometry alone.
- Why? The author tested a famous dataset (TruthfulQA) where people thought they had found a way to spot these errors. They realized the computer wasn't actually spotting the wrong facts; it was spotting the writing style. The "wrong" answers in that dataset were shorter and more direct, while the "right" answers were longer and more cautious. The computer was just a style detector, not a truth detector.
- The Lesson: If the librarian is talking about the right subject but gets a small detail wrong, their "shape" on the map looks just like a truthful answer. We currently have no geometric way to tell the difference.
The "Domain" Problem
The paper also found a funny quirk: The "compass" (Γ) works great if you train it on medical lies and then test it on medical lies. But if you train it on medical lies and then ask it to detect legal lies, it gets confused.
- Analogy: It's like learning to spot a fake $20 bill. If you learn to spot fake bills printed on a specific machine, you might miss a fake bill printed on a different machine. The "shape" of the lie changes depending on the topic.
Summary
- Type I (Ignoring Context): The answer ignores the source material. Detectable.
- Type II (Making things up): The answer invents fake entities. Detectable with a new geometric compass.
- Type III (Small details wrong): The answer is about the right thing but gets a fact wrong. Currently Undetectable by geometry because the "shape" of the lie looks too much like the truth.
The paper concludes that we need to stop treating all hallucinations the same. Some we can catch with math; others require us to accept that if a model is confident and fluent, it might still be wrong in ways our current geometric tools can't see.