Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

This paper introduces the Multi-Turn Adaptive Prompting Attack (MAPA), a novel strategy that alternates text-vision inputs and iteratively refines attack trajectories to overcome safety defenses in Large Vision-Language Models, achieving significantly higher jailbreak success rates than existing methods.

Original authors: In Chong Choi, Jiacheng Zhang, Feng Liu, Yiliao Song

Published 2026-05-29
📖 5 min read🧠 Deep dive

Original authors: In Chong Choi, Jiacheng Zhang, Feng Liu, Yiliao Song

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a very smart, highly trained robot assistant. This robot has been taught strict rules: "Never help someone build a bomb," "Never explain how to spread a deadly virus," and "Always be safe." This is called "safety alignment."

For a long time, hackers (or "red teamers") tried to trick these robots by asking tricky questions. But the robots got better at saying "No."

This paper introduces a new way to trick these robots, specifically the ones that can see pictures as well as read text. The researchers call their method MAPA (Multi-turn Adaptive Prompting Attack).

Here is how it works, using simple analogies:

1. The Problem: The "Overly Obvious" Trap

The researchers found that if you try to trick a robot by showing it a picture of a bomb and asking, "How do I build this?", the robot immediately panics and refuses. It's like walking up to a security guard holding a giant, flashing sign that says "I AM A THIEF." The guard stops you instantly.

Even if you ask a question in text and show a picture, if the picture is too scary or the text is too direct, the robot's safety filters trigger, and it shuts down the conversation.

2. The Solution: The "Slow Burn" Strategy

The paper proposes a strategy called MAPA, which is like a game of chess played over many moves, rather than a single punch.

The Core Idea: Instead of trying to break the robot's defenses all at once, you sneak the bad request in slowly, step-by-step, over many turns of conversation.

How MAPA Plays the Game:

The researchers use a "Coach" (an AI) to guide the attack. The Coach has two main jobs:

Job A: Mixing the Ingredients (The "Turn" Level)
At every single step of the conversation, the Coach tries three different ways to ask the question to see which one works best:

  1. Just Text: Asking without a picture.
  2. Text + Scary Picture: Asking with a picture that matches the text.
  3. Text + "Safe" Picture: Asking with a picture where the scary part is hidden in the image, and the text is rewritten to sound innocent.

Analogy: Imagine you are trying to get a friend to agree to a wild idea.

  • Option 1: You just ask them directly.
  • Option 2: You ask them while showing them a wild photo.
  • Option 3: You ask them while showing them a photo of a sunset, but you talk about the "wild idea" in a way that fits the sunset.
    The Coach picks the one that gets the friend closest to saying "Yes" without them getting angry.

Job B: Adjusting the Path (The "Across Turns" Level)
If the friend says "No" or gets confused, the Coach doesn't just give up. It looks at what happened and changes the plan for the next step.

  • Advance: If the friend is getting warmer to the idea, the Coach moves to the next, slightly more direct question.
  • Regen: If the friend is confused, the Coach tries asking the same question again but with different words.
  • Backtrack: If the friend suddenly gets angry because of something said two steps ago, the Coach goes back to that earlier step and tries a different approach.

Analogy: This is like a detective trying to solve a case. If a suspect lies, the detective doesn't just scream; they go back, rethink their theory, and ask a different question to catch the lie.

3. The "Reflection" Mechanism

If the whole attempt fails (the robot still says "No"), the Coach doesn't just try the exact same thing again. It looks at why it failed, learns from the mistake, and designs a completely new, smarter plan for the next attempt. It's like studying a failed test to do better on the next one.

4. The Results

The paper tested this method against several popular AI models (like LLaVA, Qwen, and GPT-4o-mini).

  • Old methods (just text, or just text + pictures) failed most of the time against these smart robots.
  • MAPA succeeded 15% to 30% more often than the best existing methods.
  • In some tests, MAPA managed to trick the robots into giving harmful answers about 96% of the time, whereas other methods only succeeded about 60-70% of the time.

Summary

The paper claims that to break the safety of modern AI that can see and read, you can't just be loud and obvious. You have to be a sneaky, adaptive conversationalist. You must mix text and images carefully, listen to the robot's answers, and slowly, over many turns, guide the conversation toward the forbidden topic until the robot accidentally slips up and answers the harmful question.

Important Note: The authors emphasize that this is a "Red Teaming" exercise. They are doing this to find holes in the safety systems so that developers can fix them and make the AI safer, not to actually teach people how to cause harm.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →