Suppressing Prior-Comparison Hallucinations in Radiology Report Generation via Semantically Decoupled Latent Steering

The Problem: The "Ghost of Scans Past"

Imagine a brilliant medical AI assistant that looks at a new X-ray of a patient's chest and writes a report for the doctor. This AI is very smart, but it has a bad habit: it loves to compare things to the past, even when there is no past to compare to.

If you show the AI a single X-ray of a broken leg, it might confidently write: "The fracture is unchanged from last week."

But wait! There was no "last week." The patient is being seen for the first time. The AI is hallucinating a history that doesn't exist. In the real world, this is dangerous. If a doctor reads "unchanged," they might think the patient is stable and miss a new, worsening injury.

This happens because the AI was trained on millions of old reports. In those reports, doctors always compared the new scan to the old one. The AI learned that "comparing to the past" is just part of the writing style, like a habit. It got so used to saying "compared to last time" that it started saying it even when it shouldn't.

The Old Solution: The "Brute Force" Approach

Previously, if you wanted to stop an AI from doing this, you had to retrain it.

The Analogy: Imagine the AI is a student who keeps writing "to be continued" at the end of every essay, even when the story is finished. The old solution was to take the student out of school, erase their entire notebook, and teach them to write from scratch using only "no-history" examples.
The Problem: This is expensive, takes forever, and is risky. If you scrub the training data too hard, the student might forget how to write any comparisons, even when a real comparison is needed. You break the student's ability to learn.

The New Solution: SDLS (The "Geometric Filter")

This paper introduces a new method called SDLS (Semantically Decoupled Latent Steering). Instead of retraining the student, they just give the AI a magic nudge while it's writing the report.

Here is how it works, using a few analogies:

1. The "Entangled Knot" (The Problem)

In the AI's brain (its "latent space"), the idea of "writing about the past" is tangled up with the idea of "describing the disease."

The Analogy: Imagine the AI's thoughts are a ball of yarn. The red thread is "medical facts" (e.g., "there is fluid in the lungs"). The blue thread is "history talk" (e.g., "compared to last time"). In a standard AI, these threads are knotted together so tightly that if you pull the blue thread to stop the history talk, you accidentally pull the red thread too, messing up the medical facts.

2. The "Magic Nudge" (The Solution)

The researchers figured out how to untangle this knot without cutting the yarn. They used a mathematical trick called QR Decomposition (think of it as a super-precise geometric filter).

The Analogy: Imagine the AI is driving a car. The "History Talk" is a strong wind blowing the car off course.
- Old Way: You try to fix the engine (retrain the car), which takes days.
- SDLS Way: You install a tiny, invisible steering wheel that only pushes the car sideways to counter the wind. It doesn't change the engine, the speed, or the destination; it just corrects the drift.

3. The "Semantic Decoupling" (The Secret Sauce)

The key innovation is that they didn't just push the car sideways; they made sure they were pushing in the exact direction of the wind, without touching the road.

They used a large language model to break down the "history" sentences into tiny pieces.
They realized that while the medical facts change (sometimes the lung gets better, sometimes worse), the style of talking about the past stays the same.
By mathematically isolating that "style," they created a vector (a direction) that says "Stop talking about the past" without saying "Stop talking about the lungs."

The Results: A "Positive-Sum" Game

Usually, when you fix one problem in AI, you break another. If you stop the hallucinations, the report might become less accurate.

The Analogy: Usually, fixing a leaky roof makes the house colder.
The Result: With SDLS, the roof is fixed, and the house actually gets warmer.
- Fewer Lies: The AI stopped making up fake history (the "FilBERT" score dropped significantly).
- Better Accuracy: Because the AI stopped getting distracted by fake history, it actually paid more attention to the real X-ray image. The medical accuracy (CheXpert score) went up.

Why It Matters

This paper proves that you don't need to rebuild the whole AI to fix its bad habits. You can just "steer" it in real-time.

It's Fast: No waiting for months of retraining.
It's Safe: It doesn't break the AI's ability to diagnose real diseases.
It's Smart: It understands that "talking about the past" is a specific habit that can be turned off without turning off the "medical brain."

In short: The researchers found a way to tell the AI, "Hey, focus on what you see right now, and ignore the ghost of scans past," without breaking the AI's brain in the process.

1. Problem Statement: Prior-Comparison Hallucinations

Radiology reports generated by Vision-Language Models (VLMs) often suffer from prior-comparison hallucinations. This occurs when a model, processing a single current imaging study, incorrectly generates text referencing historical exams (e.g., "no interval change," "worsening opacity," or "stable since [date]") despite having no access to prior images.

Root Cause: Large-scale medical corpora are heavily biased toward reports containing historical comparisons (over 76% in MIMIC-CXR). Models internalize this as a strong statistical language prior, causing them to default to comparative language even when visual evidence is absent.
Current Limitations:
- Data-Centric Approaches: Retraining models on cleaned datasets (removing historical references) is computationally expensive and permanently destroys the model's ability to perform valid longitudinal analysis when historical context is provided.
- Generic Steering: Standard activation steering (e.g., subtracting a "history" vector derived via PCA) often fails due to semantic entanglement. The "historical style" direction is statistically correlated with specific clinical findings. Blindly subtracting this vector removes valid clinical diagnoses along with the hallucinations.

2. Methodology: Semantically Decoupled Latent Steering (SDLS)

The authors propose SDLS, a training-free, inference-time control framework that modulates the model's internal hidden states to suppress hallucinations without retraining.

Core Components:

Contrastive Context Mining:
- The system constructs pairs of reports for the same image: one with historical references ( $r_{hist}$ ) and one without ( $r_{curr}$ ).
- It extracts the difference in hidden states ( $\delta = z_{hist} - z_{curr}$ ) across decoder layers to identify the "bias direction."
Semantic Decomposition (LLM-Driven):
- Instead of treating all differences as a single block, an LLM parses the reports to categorize findings into temporal semantic classes (e.g., improved, worsened, unchanged, device-related).
- This allows the system to isolate the invariant "historical reporting style" from the variable "clinical content."
QR-Based Orthogonalization (The Key Innovation):
- The authors construct a matrix $V$ where columns represent the principal difference vectors for each semantic class.
- They apply QR Decomposition to $V$ . This step is critical because it creates an orthonormal basis.
- By averaging the resulting orthogonal basis vectors, the method isolates a Semantically Decoupled Intervention Vector (SDIV).
- Geometric Logic: The SDIV is strictly orthogonal to the "visual manifold" (the space of valid clinical findings). This ensures the steering vector targets only the "historical comparison" axis, removing the bias without introducing "Semantic Error" (distorting clinical facts).
Inference-Time Intervention:
- The SDIV is injected into the model's hidden states during generation.
- Strategies: The paper tests various injection loci, finding that SteerFair (AttentionOutput)—injecting the vector specifically at the output of self-attention modules—is most effective. This suggests the hallucination signal is concentrated in the cross-modal attention mechanism.
- Norm-Preserving: The update is applied in a way that preserves the magnitude of the hidden state to prevent distribution shifts.

3. Key Contributions

Novel Framework (SDLS): A training-free method that uses LLM-driven semantic decomposition and QR orthogonalization to decouple hallucination bias from clinical semantics.
Mechanistic Insight: The paper provides evidence that prior-comparison hallucinations stem from a visual grounding failure, where the model relies on language priors rather than image features. The SDIV acts as a "safety filter" to restore visual grounding.
Positive-Sum Outcome: Unlike previous methods that trade accuracy for safety, SDLS achieves a "positive-sum" result: it reduces hallucinations while simultaneously improving clinical label fidelity (CheXpert Macro-F1).
Architectural Dependency: The study reveals that effective linear steering requires deep, persistent cross-modal interaction (as seen in BiomedGPT). Models with visual-prefix architectures (like LLaVA-Med) show diffusion of the bias, making linear steering less effective.

4. Experimental Results

The method was evaluated on BiomedGPT, VED (Vision-Encoder-Decoder), and LLaVA-Med across MIMIC-CXR, CheXpert Plus, and IU-Xray.

Quantitative Performance (MIMIC-CXR):
- Hallucination Suppression: The FilBERT score (probability of historical hallucination) decreased significantly from 0.2373 to 0.1889.
- Clinical Fidelity: The CheXpert Macro-F1 score increased from 0.2242 to 0.3208.
- Comparison: Naive baselines (Global ICV via PCA) caused catastrophic drops in F1 (e.g., dropping to 0.1720), proving that SDLS's orthogonalization is essential for safety.
Zero-Shot Transfer:
- The SDIV derived from MIMIC-CXR was applied to CheXpert Plus and IU-Xray without fine-tuning.
- Results showed robust transferability, reducing hallucination probability by up to 37.3% on IU-Xray while maintaining or improving clinical accuracy.
Qualitative Analysis:
- Attention Maps: Visualizations showed that factual terms (e.g., "endotracheal") have focused attention on anatomy, while hallucinated historical terms ("unchanged") have diffuse attention. SDLS corrects this by suppressing the diffuse, prior-driven attention.
- Case Studies: The model successfully removed phrases like "has increased since [DATE]" while preserving the core diagnosis (e.g., "pulmonary edema").

5. Significance and Conclusion

Paradigm Shift: The paper advocates moving from expensive, capability-limiting retraining to mechanistic, inference-time steering.
Clinical Trust: By eliminating false historical comparisons, the method addresses a critical safety risk in medical AI, preventing delays in intervention or unnecessary procedures caused by false "stable" or "worsening" reports.
Geometric Interpretability: The work reframes hallucination control as a geometric disentanglement problem. It demonstrates that with proper mathematical constraints (QR orthogonalization), it is possible to surgically remove statistical biases without damaging the underlying semantic signal.
Future Directions: The authors suggest future work should focus on learning intervention vectors directly from image features (to reduce dependency on text pairs) and developing automated methods for selecting optimal steering strengths.

In summary, SDLS offers a robust, interpretable, and highly effective solution to a specific, high-stakes failure mode in medical AI, proving that hallucination suppression and clinical accuracy can be achieved simultaneously through precise latent space manipulation.