Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models

This paper proposes MPCAttack, a novel framework that enhances the transferability of adversarial attacks against Multi-Modal Large Language Models by leveraging a Multi-Paradigm Collaborative Optimisation strategy to jointly aggregate and balance visual and textual semantic representations for more effective global perturbation.

Yuanbo Li, Tianyang Xu, Cong Hu, Tao Zhou, Xiao-Jun Wu, Josef Kittler

Published 2026-03-06
📖 4 min read☕ Coffee break read

Imagine you have a super-smart robot friend (a Multi-Modal Large Language Model, or MLLM) that can look at a picture and tell you a story about it. It's like a detective that combines what it sees with what it knows about the world.

But, just like any smart person, this robot can be tricked. If you show it a picture with a tiny, almost invisible sticker on it (an adversarial perturbation), the robot might suddenly think a picture of a cat is actually a dog, or that a living room is a kitchen.

The problem is that most people trying to trick these robots use a "one-trick pony" approach. They try to confuse the robot using only one way of thinking (like only looking at colors, or only looking at shapes). But because the robot is so smart and uses many ways of thinking, a single trick often fails. The robot sees through it.

The New Idea: The "Swiss Army Knife" Attack

This paper introduces a new method called MPCAttack. Instead of using just one trick, the researchers built a "Swiss Army Knife" of tricks. They realized that to really fool a super-smart robot, you need to attack it from multiple angles at the same time.

Here is how they did it, using some fun analogies:

1. The Three Different "Detectives" (The Paradigms)

Imagine you are trying to trick a security guard. Most attackers hire just one detective to figure out the guard's routine. This paper says, "Let's hire three different types of detectives, each with a unique superpower":

  • Detective A (The Matchmaker): This detective is great at matching pictures to words. If you show it a photo of a beach, it knows the word "sand" fits perfectly. It focuses on how well the image and text line up.
  • Detective B (The Storyteller): This detective doesn't just match words; it understands the story. It knows that a picture of a beach implies "vacation," "sun," and "relaxation." It looks at the deep meaning and relationships between things.
  • Detective C (The Pattern Spotter): This detective is a master of visual patterns. It doesn't care about words; it just sees shapes, textures, and lighting. It knows that a specific pattern of pixels usually means "sky."

2. The "Team Huddle" (Collaborative Optimization)

In the past, attackers would ask Detective A for a trick, then ask Detective B for a different trick, and just mash them together. It was like a team where everyone shouted their own ideas without listening to each other. The result was a messy, confused attack that the robot could easily spot.

MPCAttack changes the game by making the detectives hold a Team Huddle.

  • They compare notes.
  • They say, "Hey, Detective A, your trick is good for matching words, but Detective C's trick is better for hiding the shape. Let's combine them."
  • They create a single, perfect trick that satisfies all three detectives at once.

This "huddle" ensures the trick isn't just good at fooling one type of thinking; it's good at fooling all types of thinking simultaneously.

3. The Result: A Master Illusionist

Because this new attack method (MPCAttack) uses all three perspectives together, the "sticker" it puts on the image is incredibly powerful.

  • The Old Way: Like trying to trick a guard by wearing a fake mustache. The guard might look past it.
  • The MPCAttack Way: Like a magician who changes the lighting, the music, and the costume all at once. The guard is so overwhelmed by the combined effect that they completely believe the illusion.

Why Does This Matter?

The researchers tested this on both open-source robots (like LLaVA) and famous closed-source robots (like GPT-4o and GPT-5).

  • The Results: The new method was a huge success. It fooled the robots much more often than any previous method.
  • The Lesson: It turns out that to break a complex system, you can't just use a hammer. You need a whole toolbox, and you need to use the tools together, not separately.

In a Nutshell

This paper teaches us that to understand (or break) the security of advanced AI, we need to stop thinking in just one way. By combining different "languages" of vision and understanding, we can create attacks that are much harder to defend against. This helps scientists find the weak spots in AI before bad actors do, making our future AI systems safer.

The takeaway: If you want to trick a genius, don't just use one argument. Use a team of experts who talk to each other to build the ultimate, unbreakable argument.