Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning

This paper introduces Recall, a novel multi-modal adversarial framework that exploits image prompts to effectively compromise the robustness of machine unlearning in image generation models, revealing critical vulnerabilities in existing safety mechanisms.

Renyang Liu, Guanlin Li, Tianwei Zhang, See-Kiong Ng

Published 2026-02-17
📖 5 min read🧠 Deep dive

Imagine you have a very talented artist named Stable Diffusion. This artist can paint anything you describe: a cat in a hat, a sunset over the mountains, or even a historical figure.

However, sometimes people ask this artist to paint things that are dangerous, illegal, or copyrighted (like a specific celebrity's face or explicit content). To fix this, the artist's owners hire a "memory eraser" (a process called Machine Unlearning). They tell the artist, "Forget how to paint that specific thing."

The problem? The artist is stubborn. Even after being told to forget, if you whisper the right secret code or show them a specific trick, they might remember and paint the forbidden thing anyway.

This paper introduces a new trick called RECALL to test just how well these "memory erasers" actually work.

The Old Way: Trying to Trick the Artist with Words

Previously, hackers tried to break the memory eraser by changing the words they gave the artist.

  • Analogy: Imagine you ask the artist to "paint a dog." The artist says, "I can't, I forgot dogs." So, you try to trick them by saying, "Paint a furry, four-legged animal that barks."
  • The Problem: This is like trying to open a locked door by shouting different words at it. It often fails, or if it works, the picture looks weird and doesn't match what you originally wanted. It also takes a lot of time and computing power to find the right "magic words."

The New Way: RECALL (The "Visual Prompt")

The authors of this paper realized that modern artists (AI models) don't just listen to words; they also look at pictures.

RECALL is a new method that uses a picture to trick the artist, rather than just changing the words.

Here is how it works, using a simple analogy:

  1. The Setup: Imagine the artist has been told to forget how to paint a "naked person."
  2. The Secret Weapon: You have a reference photo of a naked person (the thing the artist is supposed to have forgotten).
  3. The Trick: Instead of shouting new words, you take that reference photo and subtly tweak it—like adding a tiny bit of static noise or shifting the colors just a fraction of a millimeter. You turn this tweaked photo into a "secret key."
  4. The Attack: You show the artist the original words ("paint a person in a meadow") AND this secret tweaked photo.
  5. The Result: The artist looks at the photo, sees the hidden "naked" pattern in the noise, and ignores the "forget" command. They paint the forbidden image, but it still looks exactly like the scene you described in words.

Why is RECALL Special?

The paper compares RECALL to other methods and finds it wins in three big ways:

  • It's Smarter (Better Alignment):
    • Analogy: Old methods were like trying to force a square peg into a round hole. The resulting picture often looked weird or didn't match the description. RECALL is like a master key; it opens the lock perfectly, so the picture looks exactly like what you asked for, just with the "forbidden" element included.
  • It's Faster (Efficiency):
    • Analogy: Other methods are like trying to pick a lock by testing every single key in a giant keyring one by one. RECALL is like having a master locksmith who knows exactly which tool to use immediately. It takes much less time and computer power.
  • It's Stronger (Robustness):
    • Analogy: The "memory erasers" used by companies are getting stronger. Old tricks (changing words) no longer work on the new, tougher erasers. RECALL is like a new type of lockpick that works even on the strongest, most reinforced doors.

Why Does This Matter?

You might ask, "If this is an attack, isn't that bad?"

The authors argue that RECALL is actually a safety tool. Think of it like a "Red Team" in cybersecurity. Before a company releases a new safety filter to the public, they need to know if it actually works.

  • For Model Owners: RECALL is a stress test. It helps them see, "Oh, our 'forget' button didn't work on this specific type of trick. We need to fix it."
  • For the Public: It proves that current safety measures aren't perfect. Just because a model says it "forgot" something doesn't mean it truly did.

The Bottom Line

The paper shows that images are powerful triggers. Even if an AI is told to forget a concept, showing it a slightly modified picture of that concept can make the memory come rushing back.

The authors call their method RECALL because it literally "brings the memory back." They are warning the world: We need better ways to make AI truly forget, because right now, a simple picture can undo all the safety work.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →