MemeIntel: Explainable Detection of Propagandistic and Hateful Memes

Imagine the internet is a giant, chaotic town square. In this square, people aren't just shouting words; they are holding up signs with pictures and short phrases. These are memes. Most of the time, they are funny jokes. But sometimes, these signs are used to trick people, spread lies (propaganda), or make others feel unsafe (hate speech).

The problem is that these signs are tricky. A picture might look innocent, but the text underneath changes the meaning entirely. Or, a joke might be so culturally specific that a computer doesn't "get" the punchline and misses the danger.

This paper introduces a new system called MemeIntel (and its dataset, MemeXplain) designed to be a super-smart detective for these signs. Here is how it works, broken down into simple concepts:

1. The Problem: The "Black Box" Detective

Imagine you hire a security guard (an AI) to watch the town square.

Old Guard: When the guard sees a suspicious sign, they just say, "Stop! That's bad!" They don't tell you why. You have to take their word for it.
The Issue: If the guard is wrong, you don't know why. Also, if you try to teach the guard to explain their reasoning while they are learning to spot the bad signs, they often get confused. It's like trying to teach a student to solve a math problem and write a poem about the solution at the exact same time. They might mess up both.

2. The Solution: The "MemeXplain" Toolkit

The researchers built a special training manual called MemeXplain. Think of this as a massive library of meme signs, but with a twist: every single "bad" sign in the library comes with a detailed, human-written note explaining exactly why it's dangerous.

The Language Trick: They created these notes in both English and Arabic. Why? Because memes often rely on local culture. A meme about a local political figure in the Middle East might make no sense to an English speaker, and vice versa. By having notes in both languages, the system can understand the "cultural context" behind the joke.

3. The Training Method: The "Two-Step Dance"

This is the most clever part of the paper. The researchers realized that teaching the AI to do two things at once (detect the hate and write the explanation) was too hard. So, they invented a Multi-Stage Optimization strategy. Think of it like training an athlete:

Stage 1: The Sprinter (Classification Only)
First, they teach the AI to run fast and spot the bad signs. They ignore the explanations for now. The AI just learns to say, "That's a hate meme" or "That's a propaganda meme." It gets really good at spotting the target.
Stage 2: The Poet (Adding the Explanation)
Once the AI is a champion at spotting the signs, then they teach it to write the explanation. Because the AI already knows what it's looking at, it can now focus on why it's looking at it, without getting confused.

Why this matters: If you try to teach the sprinting and the poetry at the same time, the athlete gets tired and performs poorly at both. By separating the training, the AI becomes a champion at both.

4. The Results: Smarter and Clearer

The researchers tested their new "Two-Step" AI against the best existing detectives.

Better Accuracy: The new AI got better at spotting the bad memes (improving accuracy by about 1.4% to 2.2%, which is a huge deal in the world of AI).
Better Explanations: Not only did it spot them better, but the reasons it gave were much clearer and more logical. It didn't just say "Bad"; it said, "This is bad because it uses a religious symbol to mock a specific group, which is culturally offensive in this context."

The Big Picture

In short, this paper is about teaching computers to be empathetic and logical detectives.

Instead of just flagging content and saying "Delete this," the system now says, "Delete this, and here is the story of why it hurts people." By using a step-by-step training method and a bilingual library of examples, they made the AI smarter, faster, and much easier for humans to trust.

The Analogy Summary:

Old Way: A robot guard points at a sign and says "Bad."
New Way (MemeIntel): A robot guard points at a sign, says "Bad," and then hands you a pamphlet that explains the history, the cultural joke, and the specific reason why this sign is harmful, all in a language you understand.

1. Problem Statement

The proliferation of multimodal content (images combined with text) on social media has intensified the spread of misinformation, propaganda, and hate speech. While significant research exists on detecting these issues, current approaches face three major limitations:

Lack of Explainability: Most models provide binary classifications (e.g., "hate" vs. "not hate") without generating natural language rationales, making it difficult for users to understand why a decision was made.
Gradient Conflicts: Training models to simultaneously perform classification and generate explanations often leads to degraded performance. The objectives differ fundamentally: classification requires mapping cues to discrete labels, while explanation generation demands fluency in free-form natural language.
Multilingual and Cultural Nuances: Existing models, particularly those trained on English-centric data, struggle with non-English content (specifically Arabic) where cultural context, religious symbols, and local political references are critical for accurate detection.

2. Methodology

The authors propose MemeIntel, a framework comprising a new dataset and a novel training strategy.

A. The MemeXplain Dataset

The authors created MemeXplain, the first large-scale, explanation-enhanced dataset for propagandistic memes in Arabic and hateful memes in English.

Data Sources:
- ArMeme: ~6,000 Arabic memes (labels: Not propaganda, Propaganda, Not-meme, Other).
- Hateful Memes: ~12,000 English memes (labels: Hateful, Not Hateful).
Explanation Generation:
- Stage 1 (Gold Standard Generation): Used GPT-4o to generate high-quality explanations for existing labeled data. For Arabic memes, explanations were generated in both Arabic and English to test multilingual capabilities.
- Stage 2 (Human Validation): Human experts evaluated the generated explanations on four metrics: Informativeness, Clarity, Plausibility, and Faithfulness using a 5-point Likert scale. The average agreement scores were high (>4.0), validating the use of AI-generated explanations as "gold" data for training.
Statistics: The dataset includes ~400k words of text and ~960k words of explanations, with average explanation lengths of ~94 words (Arabic) and ~85 words (English).

B. Multi-Stage (MS) Optimization Procedure

To solve the problem of gradient conflicts between classification and explanation generation, the authors propose a two-stage optimization strategy rather than a single-stage joint training:

Stage 1 (Classification Fine-Tuning): The Vision-Language Model (VLM) is fine-tuned only on the classification task ( $L_{classif}$ ). This establishes a robust feature backbone for detecting hate/propaganda without the distraction of text generation.
Stage 2 (Joint Optimization): The model is further fine-tuned on the combined objective ( $L_{total} = L_{classif} + L_{expl}$ $L_{t o t a l} = L_{c l a ss i f} + L_{e x pl}$ ). This stage integrates the explanation generation task, allowing the model to learn to generate rationales while preserving the classification capabilities learned in Stage 1.
- Theoretical Basis: This approach draws from Domain Adaptation and Task-Incremental Learning to prevent catastrophic forgetting and mitigate conflicting gradient signals.

C. Model Selection and Training

Base Models: Several VLMs were evaluated (Llama-3.2 11b, Paligemma 2, Qwen2-vl, Pixtral). Llama-3.2 (11b) was selected as the optimal base model.
Training Setup: Used QLoRA (4-bit quantization + Low-Rank Adaptation) to manage computational resources.
Baselines: Compared against Single-Stage (SS) fine-tuning (training on labels and explanations simultaneously) and existing State-of-the-Art (SOTA) models.

3. Key Contributions

MemeXplain Dataset: The first large-scale resource providing natural language explanations for multimodal propaganda and hate speech detection in both Arabic and English.
Multi-Stage Optimization: A novel training pipeline that decouples classification and explanation tasks to avoid gradient conflicts and catastrophic forgetting, significantly outperforming single-stage baselines.
Multilingual Explainability: Demonstrated that VLMs can generate high-quality explanations in a target language (English) for content in a different language (Arabic), aiding cross-lingual understanding.
State-of-the-Art Performance: Achieved new SOTA results on both ArMeme and Hateful Memes datasets.

4. Experimental Results

The proposed Llama MS (Multi-Stage) approach significantly outperformed all baselines:

Dataset	Metric	Previous SOTA	Llama MS (Ours)	Improvement
ArMeme	Accuracy	70.7% (mBERT)	72.1%	+1.4%
	Weighted F1	0.690	0.699	+0.009
Hateful Memes	Accuracy	77.70% (Burbi et al.)	79.9%	+2.2%
	Weighted F1	0.773 (approx)	0.802	+0.029

Ablation Study:
- Base vs. Fine-tuned: Fine-tuning improved accuracy from ~13% (Base) to ~72% (FT) on ArMeme.
- Single-Stage (SS) vs. Multi-Stage (MS): MS outperformed SS by ~4% in accuracy on Hateful Memes and significantly improved Macro-F1 scores (e.g., from 0.257 to 0.536 on ArMeme), proving the efficacy of the decoupled training strategy.
- Explanation Quality: Models trained with the MS approach achieved higher BERTScore, BLEU, and METEOR scores for generated explanations compared to SS models.

5. Significance and Impact

Transparency and Trust: By providing natural language rationales, the system moves beyond "black box" classification, helping users and moderators understand the specific visual or textual cues driving a "hate" or "propaganda" label.
Cultural Sensitivity: The inclusion of Arabic memes with English explanations bridges the gap for non-Arabic speakers (e.g., international fact-checkers) to understand local cultural nuances without needing fluency in the language.
Generalizable Framework: The multi-stage optimization procedure is architecture-agnostic and can be applied to other multimodal tasks where conflicting objectives (e.g., detection vs. generation) exist.
Resource Availability: The authors plan to release the MemeXplain dataset and training scripts to foster reproducibility and future research in explainable AI for social media safety.

In conclusion, MemeIntel demonstrates that by carefully structuring the training process (Multi-Stage Optimization) and leveraging high-quality explanation data (MemeXplain), Vision-Language Models can achieve superior performance in detecting harmful content while providing the necessary context for human understanding.