Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection

Imagine the internet is a giant, chaotic digital town square. In this square, people share "memes"—funny pictures with text that often spread like wildfire. But sometimes, these memes are like poisoned candy: they look sweet and funny on the outside, but they contain hate speech that hurts specific groups of people.

Detecting these "poisoned candies" is hard. A human moderator would have to look at millions of them every day, which is impossible and would be psychologically damaging. So, we need robots (AI) to do the job.

The Problem: The "Smart but Clumsy" Robot

Scientists have built very smart robots called Large Multimodal Models (LMMs). Think of these as super-intelligent students who have read almost every book and seen almost every picture in the world. They are great at understanding complex stories and images.

However, when you ask these super-students to spot hate in memes, they stumble for three main reasons:

They miss the nuance: They often miss the subtle, dark joke where the text and the image work together to say something hateful.
They get confused by new trends: Memes change fast. If a robot learns to spot hate about "Topic A," it often fails when a new "Topic B" meme appears. It's like a student who memorized the answers to last year's math test but can't solve this year's questions.
They forget their other skills: If you force a super-student to study only for a specific hate-detection test, they might forget how to write poetry or solve general problems. We don't want our robots to become one-trick ponies.

The Solution: RA-HMD (The "Smart Librarian" System)

The authors of this paper created a new system called RA-HMD. To understand how it works, let's use an analogy.

Imagine you are trying to identify a fake painting.

The Old Way (Standard Training): You show a student 1,000 fake paintings and say, "Memorize these!" The student memorizes the specific brushstrokes of those 1,000 paintings but fails when they see a new fake painting with slightly different colors.
The RA-HMD Way: You give the student a smart library card (Retrieval-Augmented).
1. The Library: The system has a massive database of known bad memes.
2. The Search: When a new meme arrives, the system doesn't just guess. It quickly searches its library for the most similar bad memes it has seen before.
3. The Comparison: It compares the new meme to those examples. "Hey, this looks a lot like that mean meme we saw last week about Topic X."
4. The Two-Stage Training:
  - Stage 1 (Learning the Rules): The robot learns the basic rules of hate speech while still keeping its ability to write and talk normally.
  - Stage 2 (Sharpening the Eye): The robot practices finding the "look-alikes" in the library. It learns to group similar bad memes together so it can spot them instantly, even if they are slightly different.

Why This is a Big Deal

The paper shows that this new system is a game-changer:

It's Smarter: It beats all the previous best models, even those that are much larger and more expensive.
It's Adaptable: Because it uses the "library" to find examples, it can handle new, weird memes without needing to be retrained from scratch. It's like having a detective who can look up a suspect's face in a database rather than trying to memorize every criminal's face.
It Explains Itself: When the robot says, "This is hate," it doesn't just guess. It can write a short paragraph explaining why (e.g., "This image mocks a disability, which is harmful"). The paper shows their robot explains things much better than the old robots.
It's Tougher: If someone tries to trick the robot by adding random noise to the image (like putting a filter on a photo), the RA-HMD system is much harder to fool than the others.

The Bottom Line

The researchers built a system that acts like a super-smart librarian with a detective's eye. It doesn't just memorize; it searches, compares, and learns from examples. It catches the bad memes that others miss, explains why they are bad, and doesn't lose its other brainpower in the process. This makes the internet a safer place without needing a million human moderators staring at screens all day.

1. Problem Statement

Hateful memes are a rapidly evolving form of online abuse that combines text and images in complex ways, making manual detection infeasible. While Large Multimodal Models (LMMs) offer promise due to their generative capabilities and general vision-language understanding, applying them to hateful meme detection faces three critical challenges:

Sub-optimal Performance: Standard Supervised Fine-Tuning (SFT) often fails to capture the intricate interplay between visual and textual cues in memes, leading to lower accuracy compared to specialized CLIP-based models.
Limited Out-of-Domain Generalization: Memes evolve with social trends. LMMs struggle to generalize to unseen domains or new meme types without retraining. Existing attempts at In-Context Learning (ICL) with retrieved examples have proven ineffective for this specific task.
Degradation of General Capabilities: Fine-tuning LMMs specifically for meme classification often leads to overfitting, which degrades the model's performance on general vision-language benchmarks (e.g., MMMU, GQA), undermining the rationale for using generalist LMMs over specialized models.

2. Methodology: RA-HMD Framework

The authors propose RA-HMD (Retrieval-Augmented Hateful Meme Detection), a framework designed to adapt LMMs for hateful meme detection while preserving their general capabilities. The approach consists of three main components:

A. Architecture Enhancement

Instead of relying solely on the LMM's native Language Model Head (LMH) for classification, RA-HMD introduces a dual-path architecture:

LM Head (LMH): Retains the original LMM structure to generate textual rationales and maintain language generation capabilities.
Classification/Retrieval Head: A trainable Multilayer Perceptron (MLP) projects the LMM's final hidden state into a dense embedding vector ( $g_i$ ). This vector is used by a Logistic Regression Classifier (LRC) for classification and by a Retrieval-Augmented KNN Classifier (RKC) for inference.
Separation of Concerns: This design allows the model to learn representations optimized for retrieval and classification without disrupting the original language modeling objective.

B. Two-Stage Fine-Tuning Strategy

To resolve the conflict between task adaptation and representation alignment, RA-HMD employs a two-stage training process:

Stage 1: Logistic Regression Augmented Supervised Fine-Tuning
- The LMM is fine-tuned using Low-Rank Adaptation (LoRA) (weights are frozen except for LoRA matrices).
- The MLP and LRC are updated simultaneously.
- Loss Function: A joint loss $L_{Stage1} = L_{LM} + L_{LR}$ $L_{S t a g e 1} = L_{L M} + L_{L R}$ .
  - $L_{LM}$ : Standard language modeling loss (predicting "hateful" or "benign" tokens).
  - $L_{LR}$ : Binary cross-entropy loss for the LRC prediction.
- Goal: Rapidly adapt the model to the hateful meme detection task while preserving language generation.
Stage 2: LMM Contrastive Fine-Tuning
- The LMM backbone is frozen; only the MLP and LRC are fine-tuned.
- Loss Function: $L_{Stage2} = L_{CL} + L_{LR}$ $L_{S t a g e 2} = L_{C L} + L_{L R}$ .
  - $L_{CL}$ : Contrastive Learning Loss. The model retrieves pseudo-gold positive examples (same label, high similarity) and hard negative examples (opposite label, high similarity) from a meme database using FAISS.
- Goal: Explicitly align representations of semantically similar meme pairs to improve robustness against distribution shifts and enhance out-of-domain generalization.

C. Inference Modes

RA-HMD supports three inference modes, with RKC being the primary method for out-of-domain scenarios:

LMH: Standard generation-based classification.
LRC: Logistic Regression on the projected embedding.
RKC (Retrieval-Augmented KNN): For a test meme, the system retrieves $K$ nearest neighbors from the training database and performs similarity-weighted majority voting. This leverages few-shot examples more effectively than standard ICL.

3. Key Contributions

New State-of-the-Art (SOTA) Performance: RA-HMD achieves SOTA results on six diverse meme classification datasets (HatefulMemes, HarMeme, MAMI, Harm-P, MultiOFF, PrideMM), outperforming both specialized CLIP-based models and larger agentic systems (e.g., VPD-PaLI-X-55B).
Superior Out-of-Domain Generalization: When combined with RKC, RA-HMD significantly outperforms SFT models in low-resource, cross-dataset settings. It effectively utilizes demonstration examples, overcoming the limitations of standard in-context learning.
Preservation of General Capabilities: Unlike standard SFT, RA-HMD maintains the LMM's performance on general vision-language benchmarks (MMMU, SEED-Bench, GQA), proving that the model can specialize without losing its generalist nature.
Enhanced Interpretability: The method generates higher-quality rationales for detection decisions compared to SFT, as validated by human-aligned LLM judges.
Robustness: The framework demonstrates increased resilience against adversarial attacks (e.g., SaltPepper-I-High) compared to SFT models.

4. Experimental Results

Supervised Settings: On the HatefulMemes dataset, RA-HMD (Qwen2-VL-7B) achieved an AUC of 91.1 and Accuracy of 82.1%, surpassing the previous SOTA (VPD-55B) and standard SFT LMMs.
Low-Resource/Cross-Domain: In cross-dataset evaluations (e.g., training on HarMeme, testing on HatefulMemes), RA-HMD with RKC improved AUC by 21.6% and Accuracy by 19.3% over baseline SFT few-shot models.
General Benchmarks: While SFT models showed performance degradation on MMMU and GQA, RA-HMD maintained scores comparable to the pre-trained model (e.g., MMMU: 40.4 vs. 40.2 pre-trained).
Rationale Quality: In pairwise comparisons, RA-HMD-generated explanations beat SFT explanations 61.5% of the time, with higher rubric-based scores (5.6 vs. 4.9).
Efficiency: The two-stage training is computationally efficient, taking ~4 hours on a single RTX 3090 GPU (Stage 2 takes ~10 mins due to pre-extracted features), costing less than $1 USD.

5. Significance

This paper addresses a critical gap in content moderation: the need for systems that are both highly accurate on specific, evolving tasks (hateful memes) and robust enough to generalize without catastrophic forgetting of general capabilities.

Practical Impact: RA-HMD offers a deployable solution for real-world content moderation that can adapt to new meme trends without full retraining, leveraging retrieval to handle distribution shifts.
Methodological Insight: It demonstrates that contrastive learning combined with retrieval augmentation is superior to pure generative fine-tuning for multimodal classification tasks where data is scarce and concepts are nuanced.
Ethical Consideration: The authors emphasize the importance of human oversight and cultural context in hate speech definitions, proposing that their retrieval-based approach allows for the creation of culturally tailored retrieval sets without retraining the base model.

In summary, RA-HMD represents a significant step forward in making Large Multimodal Models viable, robust, and interpretable tools for combating online hate speech.

Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection

The Problem: The "Smart but Clumsy" Robot

The Solution: RA-HMD (The "Smart Librarian" System)

Why This is a Big Deal

The Bottom Line

1. Problem Statement

2. Methodology: RA-HMD Framework

A. Architecture Enhancement

B. Two-Stage Fine-Tuning Strategy

C. Inference Modes

3. Key Contributions

4. Experimental Results

5. Significance

More like this

When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews

Demystifying When Pruning Works via Representation Hierarchies

Fine-Tuning A Large Language Model for Systematic Review Screening

Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset

Enhancing Structured Meaning Representations with Aspect Classification