HIME: Mitigating Object Hallucinations in LVLMs via Hallucination Insensitivity Model Editing

Imagine you have a very smart, well-traveled dog named LLaVA. This dog has seen millions of pictures and can describe them beautifully. But sometimes, when you show it a picture of a dog sleeping on a bed, it gets a little too excited and starts talking about things that aren't there.

It might say, "Look! There's a dog on a bed, a fluffy chair, and a red couch in the background!"

But if you look closely at the photo, there is no chair and no couch. The dog just assumed they were there because, in its training data, dogs, beds, chairs, and couches often appear together. This is called an Object Hallucination. It's like the dog is daydreaming instead of looking at the actual photo.

This paper introduces a new method called HIME (Hallucination Insensitivity Model Editing) to fix this without having to retrain the dog from scratch.

Here is how it works, broken down into simple concepts:

1. The Problem: The "Brute Force" Fix

Previously, scientists tried to stop these hallucinations by taking a sledgehammer to the dog's brain. They would say, "Okay, we don't like chairs in this context, so let's delete the part of the brain that knows about chairs!"

The Problem: This is too aggressive. By deleting the "chair" knowledge, they accidentally broke the dog's ability to recognize the "bed" or other real things. It's like trying to fix a typo in a book by tearing out the whole page; you fix the mistake, but you lose the story too.

2. The Discovery: The Brain Isn't Uniform

The authors of this paper realized something interesting: The dog's brain (the AI model) isn't the same everywhere.

Some layers of the brain are like librarians who know facts perfectly.
Other layers are like daydreamers who start making things up.

They found that the "daydreaming" happens mostly in specific layers, while the "fact-keeping" happens in others. The old methods treated the whole brain the same, which caused damage.

3. The Solution: The "Hallucination Insensitivity Score" (HIS)

To fix this, they invented a special tool called the Hallucination Insensitivity Score (HIS).

Think of the AI model as a multi-story office building.

Floors 1–10: These are the "Fact Floors." They are very sensitive to what is actually in the picture.
Floors 11–20: These are the "Daydream Floors." They are prone to making up stories about chairs and couches that aren't there.

The HIS is like a thermometer that measures how "hot" (prone to hallucinating) each floor is.

If a floor has a low score, it means it's very sensitive to reality (it's a good fact-checker).
If a floor has a high score, it means it's very sensitive to daydreaming (it's a bad fact-checker).

4. The Fix: "Layer-Adaptive Editing"

Instead of using a sledgehammer on the whole building, HIME uses a scalpel.

Measure: It checks the "thermometer" (HIS) for every single floor.
Target: It identifies exactly which floors are daydreaming too much.
Adjust: It gently tweaks the weights (the connections between neurons) only on those specific floors.
- On the "Daydream Floors," it tightens the rules to stop the imagination.
- On the "Fact Floors," it leaves everything alone so the dog still remembers what a bed looks like.

The Result

By doing this precise surgery:

The Dog Stops Lying: It stops mentioning the fake chair and couch.
The Dog Keeps Knowing: It still correctly identifies the bed and the dog.
No Extra Cost: Unlike other methods that require the dog to "think twice" before answering (which makes it slow), HIME just changes the dog's brain once. After that, it answers just as fast as before, but with much better accuracy.

In Summary

HIME is like a smart editor who doesn't rewrite the whole book. Instead, they find the specific paragraphs where the author started making things up, gently corrects those sentences, and leaves the rest of the story exactly as it was. This makes the AI more reliable, honest, and ready for real-world use without slowing it down.

1. Problem Statement

Large Vision-Language Models (LVLMs) suffer from object hallucination, where they generate descriptions containing non-existent objects or incorrect attributes not grounded in the visual input. While fine-tuning is a common mitigation strategy, it is computationally expensive and requires curated supervision. Existing training-free alternatives, such as model editing, often apply indiscriminate (uniform) weight modifications across all decoder layers.

The paper identifies a critical flaw in current editing approaches (e.g., Nullu): applying fixed, uniform edits to all layers risks distorting pre-trained knowledge. For instance, removing a hallucinated object (e.g., a "chair") might inadvertently suppress the representation of a real object (e.g., a "bed") that frequently co-occurs with the hallucination, leading to a loss of factual visual knowledge. The core challenge is determining how much intervention is necessary at each specific layer to suppress hallucinations without damaging the model's implicit knowledge.

2. Methodology

The authors propose HIME (Hallucination Insensitivity Model Editing), a training-free, layer-adaptive weight editing framework. The methodology consists of three main stages:

A. Layer-Wise Analysis & The Hallucination Insensitivity Score (HIS)

The authors first analyze LVLM decoders (built on Qwen, LLaMA, and Vicuna backbones) and discover that hallucination susceptibility is not uniform; it varies significantly across layers.

Metric: They introduce the Hallucination Insensitivity Score (HIS).
Computation:
1. Contrastive pairs are created using the LURE dataset (one truthful caption, one hallucinated caption for the same image).
2. The model processes both, and attention matrices are extracted for each layer.
3. The attention distributions for truthful vs. hallucinated tokens are flattened and converted into histograms.
4. HIS is calculated as the KL Divergence between these two distributions for each layer.
Interpretation: A low HIS indicates a layer where the model struggles to distinguish between truthful and hallucinated attention patterns (high susceptibility). A high HIS indicates a layer that robustly discriminates between them.

B. Layer-Adaptive Weight Editing

Unlike previous methods that apply a hard projection to all layers, HIME uses the HIS to guide selective, weighted intervention.

Feature Extraction: For each layer, the authors extract hidden embeddings and weight them by the positional attention distribution to create "attention-guided features" for both truthful and hallucinated samples.
Subspace Identification: They compute the difference matrix between truthful and hallucinated features and perform Singular Value Decomposition (SVD). The top- $k$ right singular vectors define a hallucination subspace.
Weighted Projection: Instead of fully orthogonalizing weights (which causes knowledge loss), HIME applies a weighted null-space operator:
$N_\ell = I - \text{HIS}^c_\ell \cdot P_\ell$
Where $P_\ell$ $P_{ℓ}$ is the projector onto the hallucination subspace, and $\text{HIS}^c_\ell$ $HIS_{ℓ}^{c}$ is the complement of the Hallucination Insensitivity Score.
- Logic: Layers with low HIS (high susceptibility) receive a stronger edit (closer to full projection). Layers with high HIS (robust) receive minimal or no editing, preserving pre-trained knowledge.

C. Offline Editing

The process is performed offline. The modified weights are saved and reloaded for inference. This results in zero additional parameters, zero inference latency, and zero computational overhead during generation.

3. Key Contributions

Systematic Layer-Wise Analysis: The paper reveals that object hallucination susceptibility follows a depth-dependent pattern across different LVLM architectures, challenging the assumption that all decoder layers contribute equally to hallucinations.
Hallucination Insensitivity Score (HIS): A novel, principled metric that quantifies layer-specific sensitivity to hallucinations, serving as a guide for targeted intervention.
HIME Framework: A training-free, layer-adaptive model editing method that selectively suppresses hallucination-related latent directions while preserving factual knowledge.
State-of-the-Art Performance: HIME outperforms existing decoding-time (e.g., VCD, DoLa) and editing-based (e.g., Nullu) methods across multiple backbones and benchmarks.

4. Experimental Results

The authors evaluated HIME on LLaVA-1.5, MiniGPT-4, mPLUG-Owl2, Qwen2-VL, and Qwen3-VL.

CHAIR Benchmark (Object Hallucination):
- HIME reduced object hallucinations by an average of 61.8% across open-ended generation tasks.
- On LLaVA-1.5, HIME achieved a CHAIRs (sentence-level) score of 13.80, significantly outperforming the baseline (20.40) and the previous SOTA editing method, Nullu (15.20).
- Crucially, HIME maintained or improved BLEU scores, indicating that caption quality and fluency were not compromised.
MME Benchmark (Perception & Cognition):
- HIME improved performance on perception tasks (e.g., Counting, Position, Celebrity recognition) compared to the baseline and Nullu.
- Unlike Nullu, which sometimes degraded performance on specific tasks, HIME preserved or enhanced the model's general reasoning and visual grounding capabilities.
GPT-4V Aided Evaluation (LLaVA-Bench):
- In open-ended generation, HIME demonstrated higher Accuracy (fewer hallucinations) than the baseline and Nullu, while maintaining high Detailedness.
Ablation Studies:
- Removing the HIS weighting (applying uniform editing) resulted in lower performance, confirming that layer-adaptive intervention is superior to global editing.
- Editing specific layer ranges (e.g., late layers) yielded better results than editing all layers uniformly.

5. Significance

HIME represents a significant advancement in making LVLMs reliable for real-world deployment without the cost of retraining.

Efficiency: It eliminates the need for expensive fine-tuning or inference-time decoding tricks (like contrastive decoding) that increase latency.
Knowledge Preservation: By introducing the concept of "insensitivity" and using it to modulate editing strength, HIME solves the "knowledge distortion" problem inherent in previous model editing techniques. It proves that hallucinations can be suppressed by targeting specific, vulnerable layers while leaving robust knowledge representations intact.
Generalizability: The approach is architecture-agnostic, working effectively across models based on LLaMA, Vicuna, and Qwen backbones.

In summary, HIME provides a principled, efficient, and effective solution to object hallucination by treating the LVLM decoder not as a monolithic block, but as a layered system with varying degrees of susceptibility, allowing for precise, surgical interventions.