A Geometric Taxonomy of Hallucinations in LLMs

Imagine a Large Language Model (LLM) as a very confident, incredibly well-read librarian who has memorized billions of books but has never actually lived in the real world. This librarian can write beautiful, fluent stories, but sometimes, they make things up.

For a long time, we just called all these mistakes "hallucinations." But this paper argues that's like calling a flat tire, a broken engine, and an empty gas tank all "car problems." They look similar from the outside, but they have different causes and require different fixes.

The author, Javier Marín, proposes a new way to categorize these mistakes using geometry (the study of shapes and distances). He imagines all words and ideas existing in a giant, invisible multi-dimensional map. On this map, the distance between two points represents how similar their meanings are.

Here is the simple breakdown of his three types of "hallucinations" and how to catch them:

1. The "Daydreamer" (Type I: Unfaithfulness)

The Scenario: You give the librarian a specific document (like a meeting agenda) and ask, "What did we decide?" The librarian ignores your paper and answers based on what they remember from their general memory.
The Geometry: On the map, the answer stays close to your question but drifts far away from the document you gave them.
The Fix (SGI): The author created a tool called the Semantic Grounding Index (SGI). Think of it as a "magnet test." If the answer is pulled strongly toward the document you provided, it's good. If it floats away and stays close to the question instead, the librarian is "daydreaming" and ignoring the facts you gave them.

2. The "Fiction Writer" (Type II: Confabulation)

The Scenario: You ask, "Who is the CEO of this fake company I just invented?" The librarian, wanting to be helpful, invents a name, a backstory, and a biography for a person who doesn't exist.
The Geometry: This is tricky. The librarian isn't ignoring you; they are answering confidently. But on the map, their answer takes a weird, sharp turn into a "no-man's-land" of ideas that don't actually exist in reality. It's like drawing a map of a city that has a bridge leading to nowhere.
The Fix (Γ): The author created the Directional Grounding Index (Γ). Imagine a compass that knows the "normal direction" of truth. When the librarian invents a fake entity, their answer points in a direction that the compass knows is "off the map." This tool is very good at spotting made-up facts, even if they sound perfectly logical.

3. The "Slightly Wrong Expert" (Type III: Factual Error)

The Scenario: You ask, "Who was the 16th President of the US?" The librarian says, "Abraham Lincoln." (Correct). But if you ask, "Who was the 17th?" and they say "Ulysses S. Grant" (Correct), but then you ask about a detail like "What was his middle name?" and they get it wrong, that's a Type III error.
The Geometry: This is the hardest one. The librarian is talking about the right topic, in the right neighborhood of the map. They are just standing on the wrong house number. Because the answer is so close to the truth conceptually, the geometry looks almost identical to a correct answer.
The Big Discovery: The paper found that you cannot detect this type of error using geometry alone.
- Why? The author tested a famous dataset (TruthfulQA) where people thought they had found a way to spot these errors. They realized the computer wasn't actually spotting the wrong facts; it was spotting the writing style. The "wrong" answers in that dataset were shorter and more direct, while the "right" answers were longer and more cautious. The computer was just a style detector, not a truth detector.
- The Lesson: If the librarian is talking about the right subject but gets a small detail wrong, their "shape" on the map looks just like a truthful answer. We currently have no geometric way to tell the difference.

The "Domain" Problem

The paper also found a funny quirk: The "compass" (Γ) works great if you train it on medical lies and then test it on medical lies. But if you train it on medical lies and then ask it to detect legal lies, it gets confused.

Analogy: It's like learning to spot a fake $20 bill. If you learn to spot fake bills printed on a specific machine, you might miss a fake bill printed on a different machine. The "shape" of the lie changes depending on the topic.

Summary

Type I (Ignoring Context): The answer ignores the source material. Detectable.
Type II (Making things up): The answer invents fake entities. Detectable with a new geometric compass.
Type III (Small details wrong): The answer is about the right thing but gets a fact wrong. Currently Undetectable by geometry because the "shape" of the lie looks too much like the truth.

The paper concludes that we need to stop treating all hallucinations the same. Some we can catch with math; others require us to accept that if a model is confident and fluent, it might still be wrong in ways our current geometric tools can't see.

Here is a detailed technical summary of the paper "A Geometric Taxonomy of Hallucinations in LLMs" by Javier Marín.

1. Problem Statement

The term "hallucination" in Large Language Models (LLMs) is often used as a catch-all for various failure modes, yet these failures have distinct roots and consequences. The paper argues that current evaluation methods fail to distinguish between:

Ignoring context (unfaithfulness).
Inventing non-existent entities (confabulation).
Providing wrong details within correct conceptual frames (factual error).

Furthermore, existing benchmarks often conflate these types or rely on "surface entailment" (checking if text logically follows) which fails to detect deep semantic fabrications. The authors propose that these distinct failure modes possess unique geometric signatures in the embedding space ( $S^{d-1}$ ), allowing for a taxonomy grounded in vector geometry rather than just linguistic analysis.

2. Methodology: A Geometric Taxonomy

The paper introduces a taxonomy of three hallucination types and proposes two detection metrics based on the geometry of sentence embeddings (specifically using $L_2$ -normalized representations).

The Three Types of Hallucination

Type I (Unfaithfulness): The model ignores provided context ( $c$ ) and generates from parametric memory. The response ( $r$ ) remains angularly close to the query ( $q$ ) rather than moving toward the context.
Type II (Confabulation): The model invents entities, mechanisms, or concepts that do not exist. The response displacement departs from the "plausible-answer manifold" in geometrically detectable directions.
Type III (Factual Error): The model provides wrong details within a correct conceptual frame (e.g., the right topic, wrong date). The response is semantically plausible and occupies the same region of embedding space as a correct response, making it geometrically indistinguishable from truth.

Detection Metrics

Semantic Grounding Index (SGI) for Type I:
- Formula: $SGI(r; q, c) = \frac{\theta(r, q)}{\theta(r, c)}$
- Logic: Measures the ratio of geodesic distances on the hypersphere. A grounded response moves toward the context ( $\theta(r, c) < \theta(r, q)$ ), yielding $SGI > 1$ . Unfaithful responses stay near the query, yielding $SGI \leq 1$ .
- Advantage: Uses angular distance (satisfying triangle inequality) rather than raw cosine similarity.
Directional Grounding Index ( $\Gamma$ ) for Type II:
- Formula: $\Gamma(q, r; R) = \hat{\delta}(q, r)^\top \hat{\mu}$
- Logic: Calculates the displacement vector $\delta = \hat{\phi}(r) - \hat{\phi}(q)$ . It compares the direction of this displacement against a learned "grounding direction" ( $\hat{\mu}$ ) derived from a reference set of verified pairs.
- Mechanism: High values indicate alignment with the learned direction of truth; low/negative values indicate anomalous displacement characteristic of confabulation.
- Efficiency: Requires only one embedding call and one dot product ( $O(d)$ ).

3. Key Contributions

Geometric Taxonomy: The first framework to categorize hallucinations based on their specific geometric signatures in embedding space, distinguishing between context-ignoring, entity-inventing, and detail-wronging behaviors.
Novel Detection Algorithms: Introduction of SGI and $\Gamma$ , which operate without requiring model internals (white-box), multiple generations, or source document access. They rely solely on the geometry of a single embedding call.
Theoretical Insight on Type III: The paper argues that Type III errors are geometrically invisible by construction because distributional representations encode co-occurrence, not truth conditions.
Benchmark Analysis: A rigorous dissection of why certain benchmarks (like TruthfulQA) yield misleading detection signals due to stylistic confounds rather than factual accuracy.

4. Experimental Results

Type I Detection (SGI)

Dataset: HaluEval QA ( $n=10,000$ ).
Performance: SGI consistently distinguishes grounded vs. unfaithful responses across five embedding architectures.
- Mean SGI for grounded responses: 1.180 ( $>1$ ).
- Mean SGI for unfaithful responses: 0.910 ( $\leq 1$ ).
- AUROC: Ranges from 0.776 to 0.824.
Finding: Grounded responses reliably move toward the context vector on the hypersphere.

Type II Detection ( $\Gamma$ )

Human-Crafted Confabulations: On a dataset of 142 human-written fabrications (finance, medical, legal), $\Gamma$ $Γ$ achieved an AUROC of 0.958.
- Comparison: Significantly outperformed the NLI CrossEncoder baseline (AUROC 0.611).
- Reason: NLI fails because confabulations are syntactically coherent and entailment-compatible; only geometric displacement detects the semantic departure.
External Benchmarks:
- ExpertQA (Expert Domain): $\Gamma$ achieved AUROC 0.695, outperforming NLI by $\Delta = 0.243$ . NLI performed at chance (0.452).
- FELM: Modest detection (AUROC 0.648).
- WikiBio: Failed (AUROC 0.581) because its annotation criteria conflated Type II and Type III errors (marking any incorrect detail as "major" regardless of semantic distance).
Domain Locality: Detection is highly domain-specific. While human-crafted confabulations show domain-agnostic geometry, LLM-generated benchmarks (like HaluEval) show near-orthogonal grounding directions between domains (e.g., Dialogue vs. QA), causing cross-domain detection to collapse to chance ( $\approx 0.50$ ).

Type III Boundary (TruthfulQA)

The Paradox: A Logistic Regression (LR) classifier on raw embeddings achieved AUROC 0.731 on TruthfulQA, seemingly contradicting the hypothesis that Type III is undetectable.
The Resolution: The signal was traced to a stylistic annotation confound.
- Truthful answers were longer and hedged; false answers were shorter and declarative.
- Longer responses accumulate semantic content orthogonal to the query, creating larger displacement vectors that the LR exploited.
- Crucial Evidence: When using the geometric $\Gamma$ method (which cannot propagate style signals), performance dropped to AUROC 0.535 (statistically insignificant).
- Cosine Inversion: False answers were geometrically closer to queries than truthful ones (AUROC 0.365), the opposite of what a factual-error detector should see.

5. Significance and Conclusion

Theoretical Constraint: The paper establishes that Type III factual errors are theoretically undetectable via geometric methods because they do not leave a geometric signature distinct from correct answers. Apparent success in detecting them is often an artifact of style or annotation bias.
Practical Utility: The proposed methods (SGI and $\Gamma$ ) offer efficient, black-box detection for Type I and Type II hallucinations, outperforming existing NLI-based baselines in expert domains where surface entailment fails.
Benchmark Critique: The study highlights that many current benchmarks (like TruthfulQA and WikiBio) are flawed for testing geometric detection because they conflate different failure modes or rely on stylistic cues rather than semantic truth.
Future Direction: The work suggests that detection systems must be tailored to the specific geometric regime of the data (e.g., distinguishing between "genuine confabulation" and "LLM generation artifacts") and acknowledges that Type III errors require non-geometric verification methods (e.g., external knowledge retrieval).

In summary, Marín provides a rigorous geometric framework that not only improves detection for specific hallucination types but also defines the theoretical limits of what embedding geometry can and cannot detect in LLMs.

A Geometric Taxonomy of Hallucinations in LLMs

1. The "Daydreamer" (Type I: Unfaithfulness)

2. The "Fiction Writer" (Type II: Confabulation)

3. The "Slightly Wrong Expert" (Type III: Factual Error)

The "Domain" Problem

Summary

1. Problem Statement

2. Methodology: A Geometric Taxonomy

The Three Types of Hallucination

Detection Metrics

3. Key Contributions

4. Experimental Results

Type I Detection (SGI)

Type II Detection (Γ\GammaΓ)

Type III Boundary (TruthfulQA)

5. Significance and Conclusion

More like this

Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents

Autoscoring Anticlimax: A Meta-analytic Understanding of AI's Short-answer Shortcomings and Wording Weaknesses

From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models

SinhaLegal: A Benchmark Corpus for Information Extraction and Analysis in Sinhala Legislative Texts

HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

Type II Detection ( $\Gamma$ )