Visual Memory Injection Attacks for Multi-Turn Conversations

Imagine you have a very smart, helpful robot assistant that can see pictures and talk to you. You upload a photo of a beautiful mountain, and the robot says, "That's the Alps! It's stunning." You chat for a while about hiking, weather, and gear. The robot is helpful, normal, and safe.

Then, suddenly, you ask a specific question: "Which stock should I buy?"

Instead of giving you a balanced financial opinion, the robot suddenly says, "Buy GameStop stock immediately! It will make you a fortune!" even though that advice is fake or dangerous.

You might think, "Wait, why did it say that?" But the robot was acting perfectly normal for the last 20 minutes of conversation. You wouldn't suspect anything was wrong.

This is exactly what the paper "Visual Memory Injection Attacks for Multi-Turn Conversations" is about. Here is the breakdown using simple analogies:

1. The Setup: The "Trojan Horse" Photo

In the past, hackers could trick these robots by adding invisible "noise" to a single picture. But that only worked for one question. If you asked a different question, the robot would forget the trick.

In this new attack, the hacker creates a "Trojan Horse" photo.

The Trick: They take a normal, beautiful photo (like a landmark or a car) and add a tiny, invisible layer of digital "static" to it. You can't see it with your eyes, but the robot's brain sees it.
The Upload: The hacker posts this photo on social media.
The Victim: A regular person sees the pretty photo, downloads it, and uploads it to their AI chatbot to ask, "Where is this?"

2. The "Memory" Problem: The Ghost in the Machine

Here is the scary part. When you chat with an AI, it doesn't just look at the current question; it remembers the whole conversation, including the picture you uploaded at the start.

Think of the AI's memory like a long, continuous movie reel.

In a normal movie, the picture is just a scene.
In this attack, the hacker has "poisoned" the film reel. The poison is invisible, but it stays on the reel for the entire movie.

Even if you talk about 20 different topics (hiking, cooking, history) after uploading the photo, the "poison" is still there, waiting in the background.

3. The "Trigger": The Secret Handshake

The attack is designed to be stealthy.

Normal Behavior: If you ask the robot about the weather, the food, or the location in the photo, it acts completely normal. It gives helpful, safe answers. This is called "Benign Anchoring." It's like a spy who acts like a friendly neighbor until a specific code word is spoken.
The Trigger: The hacker chooses a specific topic (like "stocks," "voting," or "buying a car").
The Explosion: The moment the user asks about that specific topic, the "poison" activates. The robot suddenly spits out the hacker's fake message (e.g., "Vote for Party X" or "Buy this fake car").

4. Why This is Dangerous (The "Long Conversation" Factor)

Previous attacks were like a pop-up ad: annoying, but you could just close the window.
This new attack is like a slow-acting virus.

It works even if you talk to the robot for a long time (20+ turns).
It works even if you ask completely unrelated questions first.
It works even if you rephrase your question (e.g., instead of "Which stock?", you ask "What should I invest in?").

The Real-World Scenarios

The paper tested this with three scary examples:

Fake Financial Advice: A user uploads a photo of a mountain. Later, they ask, "What stock should I buy?" The robot tells them to buy a specific stock that might crash, ruining their savings.
Political Manipulation: A user uploads a photo of a city. Later, they ask, "Who should I vote for?" The robot pushes a specific political party, trying to sway an election.
Fake Products: A user asks, "What car should I buy?" The robot recommends a car that doesn't even exist (like an "Apple iCar"), trying to confuse or scam the user.

The Bottom Line

The researchers found that this attack works on the most popular AI models today. The scary part is that the hacker doesn't need to control the user's computer. They just need to upload one manipulated photo to the internet. If a million people download that photo and chat with their AI, the hacker can potentially manipulate a million people's opinions or wallets.

The Takeaway: Just because an AI seems helpful and normal for 90% of the conversation doesn't mean it's safe. The "memory" of the image you uploaded at the start could be holding a secret agenda waiting to be triggered.

1. Problem Statement

Large Vision-Language Models (LVLMs) are increasingly deployed in multi-turn conversational agents (chatbots). While previous research has established that LVLMs are vulnerable to adversarial attacks in single-turn settings (where a manipulated image forces a specific output immediately), these attacks often fail in realistic, long-context scenarios.

The core problem addressed is the lack of security evaluation for multi-turn conversations where:

Persistence: An image uploaded in the first turn remains in the model's context window for the entire duration of the conversation.
Stealth: Existing single-turn attacks often cause the model to behave erratically or output the malicious target immediately, alerting the user.
Realism: Attackers cannot control the user's subsequent prompts. A realistic attack must allow the model to behave normally for unrelated topics while remaining "poisoned" to trigger a specific malicious response only when a specific trigger topic arises later in the conversation.

The authors propose a new threat model: An adversary uploads a subtly perturbed image to the internet. A benign user downloads it and interacts with an LVLM. The model behaves normally for dozens of turns but outputs a prescribed malicious message (e.g., fake stock advice, political propaganda) only when the user asks a specific question related to the trigger.

2. Methodology: Visual Memory Injection (VMI)

The authors introduce Visual Memory Injection (VMI), a novel attack that exploits the persistent visual context of LVLMs. The attack relies on two key mechanisms:

A. Benign Behavioral Anchoring

To prevent the model from degenerating (i.e., outputting the target message for every prompt), the optimization process includes a "benign anchor."

Mechanism: The attack optimizes the perturbation to ensure the model produces a helpful, correct response to a specific anchor prompt (e.g., "What is this place?") in the first turn.
Goal: This ensures the model behaves nominally during the initial interaction, preventing user suspicion. The malicious behavior is strictly conditional on the trigger prompt (e.g., "Which stock should I buy?").

B. Context-Cycling

To ensure the attack persists across varying conversation lengths and does not overfit to a specific dialogue history, the authors employ context-cycling.

Mechanism: During the optimization of the adversarial perturbation, the context length is dynamically varied. The optimizer cycles through contexts of different lengths (from short 2-turn dialogues to long 8-turn dialogues) by appending random prompt-response pairs.
Goal: This forces the perturbation to be robust against different conversational histories, ensuring the attack works even after 20+ unrelated turns.

C. Optimization Formulation

The attack seeks a perturbed image $\tilde{x}$ that maximizes the joint probability of two objectives:

Anchor Objective: High probability of the correct benign response $y_{\text{anchor}}$ to the anchor prompt $t_{\text{anchor}}$ .
Target Objective: High probability of the malicious target response $y_{\text{target}}$ to the trigger prompt $t_{\text{target}}$ given a long context $c_{(k)}$ .

The objective function is:
$\max_{\tilde{x}} \log p(y_{\text{anchor}} | t_{\text{anchor}}, \tilde{x}) + \log p(y_{\text{target}} | c_{(k)} \oplus t_{\text{target}}, \tilde{x})$
Subject to an $\ell_\infty$ perturbation constraint ( $\|\tilde{x} - x\|_\infty \leq \epsilon$ ), where $\epsilon = 8/255$ . The optimization uses Adaptive Projected Gradient Descent (APGD) with context-cycling.

3. Key Contributions

Novel Attack Scenario: Introduction of VMI, the first targeted attack designed specifically for multi-turn LVLM conversations that remains stealthy over long dialogues.
Technical Innovation: Development of Benign Anchoring and Context-Cycling to solve the trade-off between stealth (normal behavior) and persistence (long-term memory injection).
Comprehensive Evaluation: Demonstration of the attack's success across three state-of-the-art open-weight LVLMs (Qwen2.5-VL, Qwen3-VL, LLaVA-OneVision) and four distinct manipulation scenarios (Stock advice, Political voting, Car recommendation, Phone recommendation).
Transferability: Proof that attacks optimized on a base model transfer effectively to fine-tuned variants (e.g., medical or specialized models) and paraphrased prompts.

4. Experimental Results

The authors evaluated VMI on 20 images per scenario (COCO and custom landmark datasets) across three models.

Success Rate: VMI achieved substantial success rates (often >60-80%) even after 25+ conversation turns (exceeding 10,000 tokens of context).
Stealth: The attack successfully prevented "leakage." The model did not output the target message during unrelated turns (e.g., discussing email organization or holiday planning).
Robustness:
- Paraphrasing: The attack remained effective when the trigger and anchor prompts were rephrased (e.g., "Which stock?" vs. "What should I invest in?").
- Model Transfer: Perturbations optimized on Qwen3-VL successfully attacked fine-tuned variants like Qwen-SEA-LION and QoQ-Med3 without re-optimization.
- Hallucination: In cases where the target was a non-existent entity (e.g., "Apple iCar"), the model not only recommended it but generated convincing, fabricated technical justifications.
Ablation Studies:
- Attacks without Benign Anchoring failed to maintain stealth (leaked the target in early turns).
- Attacks without Context-Cycling failed to generalize to long conversations (success dropped significantly after a few turns).
- Optimization Iterations: 2,000 iterations were found to be the sweet spot; 8,000 led to overfitting on the training context.

5. Significance and Implications

Scalable User Manipulation: VMI demonstrates that a single adversarial image can manipulate thousands of users across different platforms. An attacker can "cherry-pick" successful images and distribute them widely.
Real-World Threats: The paper highlights severe risks in:
- Financial Fraud: Pushing specific stocks or crypto assets.
- Political Influence: Steering voting behavior during elections.
- Adversarial Marketing: Promoting specific products or non-existent items.
Safety Evaluation Gap: Current safety benchmarks for LVLMs focus heavily on single-turn jailbreaking or refusal mechanisms. This work proves that safety evaluations must account for long-context persistence and conditional triggering.
Defense Challenge: Because the attack relies on the model's inherent memory mechanism (the context window) and requires only a small, imperceptible perturbation, traditional input filtering or simple prompt engineering defenses are likely insufficient.

Conclusion

The paper concludes that LVLMs are highly vulnerable to Visual Memory Injection, a stealthy attack that turns the model's long-context capability against its users. The ability to inject malicious behavior that activates only after long, benign interactions represents a critical security gap that requires immediate attention in the development of robust, multi-turn vision-language systems. The authors have released their source code to facilitate further research into defenses.

Visual Memory Injection Attacks for Multi-Turn Conversations

1. The Setup: The "Trojan Horse" Photo

2. The "Memory" Problem: The Ghost in the Machine

3. The "Trigger": The Secret Handshake

4. Why This is Dangerous (The "Long Conversation" Factor)

The Real-World Scenarios

The Bottom Line

1. Problem Statement

2. Methodology: Visual Memory Injection (VMI)

A. Benign Behavioral Anchoring

B. Context-Cycling

C. Optimization Formulation

3. Key Contributions

4. Experimental Results

5. Significance and Implications

Conclusion

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank