Benchmarking Unlearning for Vision Transformers

This paper presents the first comprehensive benchmark for machine unlearning in Vision Transformers, evaluating various algorithms across different model architectures, datasets, and protocols to establish performance baselines and analyze how VTs memorize data compared to CNNs.

Kairan Zhao, Iurie Luca, Peter Triantafillou

Published 2026-02-24
📖 6 min read🧠 Deep dive

Imagine you have a very smart student who has studied a massive library of books to learn how to recognize objects (like cats, cars, or numbers). This student is a Vision Transformer (VT). They are incredibly good at this job, often better than the older generation of students (called CNNs) who used to rule the classroom.

However, there's a problem. Sometimes, this student memorizes specific, embarrassing, or private details from a few specific books in the library. Maybe they memorized a photo of a celebrity's private diary, or a picture of a biased stereotype. If someone asks the student to "forget" that specific book, the student usually can't just erase that one page without messing up their entire knowledge base. This process of teaching a machine to "unlearn" specific data is called Machine Unlearning.

Until now, researchers had great ways to help the old students (CNNs) forget things, but they hadn't figured out how to do it for the new, super-smart students (Vision Transformers).

This paper is like a giant report card that tests exactly how well different "forgetting techniques" work on these new Vision Transformer students.

The Big Experiment: The "Forget-Me-Not" Test

The researchers set up a massive classroom experiment to see which teaching method works best. Here's how they did it, using some fun analogies:

1. The Students (The Architectures)

They tested two types of Vision Transformers:

  • ViT (Vision Transformer): Think of this student as someone who looks at a whole picture at once, like a bird flying high and seeing the whole forest. They are great at seeing the big picture but might get a bit scattered.
  • Swin-T (Swin Transformer): This student looks at the picture in small, overlapping windows, like a detective examining clues one by one. They are more organized and structured, acting a bit more like the old-school CNNs.

2. The "To-Forget" List (The Data)

They didn't just test on one type of homework. They used four different datasets:

  • CIFAR-10 & 100: Simple picture sets (like a child's drawing book vs. a slightly harder one).
  • SVHN: A dataset of house numbers (very clear, easy to read).
  • ImageNet: A massive library of millions of complex, real-world photos.

3. The Teaching Methods (The Algorithms)

They tried three main ways to make the student forget:

  • Fine-Tuning (The "Re-read" Method): Just tell the student, "Ignore that one book, and re-read the rest of the library." It's simple and often works well for easy tasks.
  • NegGrad+ (The "Anti-Force" Method): This is like a coach who says, "Push your brain away from that specific memory while keeping your other skills sharp." It's a more aggressive, mathematical way to erase the memory.
  • SalUn (The "Highlighter" Method): This method tries to find exactly which neurons (brain cells) are holding that bad memory and only tweaks those. It's like using a highlighter to find the specific paragraph to erase.

4. The "Memory Detectives" (Proxies)

To know what to forget, the teachers need to know which parts of the student's brain are "stuck" on the bad data. They used "proxies" (clues) to guess this:

  • Confidence: If the student is too sure about an answer, they probably memorized it.
  • Holdout Retraining: A clever trick where they train a mini-model to see if the student's behavior changes when the bad data is removed.

The Surprising Results

Here is what the report card revealed, translated into plain English:

1. The "Old" Tricks Still Work (But with a Twist)
The methods that worked for the old CNN students also worked for the new Vision Transformers. In fact, the new students were sometimes better at forgetting than the old ones!

  • The Winner: NegGrad+ was the star player. It was the most consistent and robust method, especially for the complex ImageNet dataset. It's like the coach who knows exactly how to push the memory out without breaking the student's confidence.
  • The Runner-Up: Fine-Tuning was surprisingly effective, especially on simpler tasks. Sometimes, just telling the student to focus on the good stuff is enough.
  • The Loser: SalUn (the highlighter method) was a bit shaky. It was good at keeping the student's grades up, but it wasn't very good at actually hiding the secret (it failed the "Membership Inference Attack" test, meaning a hacker could still tell if the bad data was in the brain).

2. The "Big Picture" vs. The "Detective"

  • ViT (The Bird): This student preferred the Fine-Tuning method. Because they look at the whole picture at once, their memories are spread out everywhere. Trying to surgically remove one memory (like SalUn) is hard. It's easier to just re-focus their attention.
  • Swin-T (The Detective): This student loved NegGrad+. Because they look at the picture in organized chunks, their memories are more localized. The "Anti-Force" method worked perfectly to push those specific chunks away.

3. The "Pre-Training" Advantage
The Vision Transformers were pre-trained on a massive library (ImageNet) before the experiment started. This gave them a superpower: They didn't need to memorize the bad data as much to begin with. Because they already understood the world so well, removing a few bad examples didn't hurt their overall knowledge as much as it did for the older students.

4. The "Continual" Challenge
In the real world, you might need to forget data repeatedly (e.g., every week). The researchers tested if the student would get "brain fog" after forgetting things 5 or 10 times in a row.

  • Good News: The student remained stable! The performance didn't drop significantly. They could keep forgetting new things without losing their mind.

The Takeaway for Everyone

This paper is a huge step forward because it gives us a rulebook for making AI safe and private.

  • If you have a simple AI: Just use the "Re-read" method (Fine-Tuning). It's cheap and works.
  • If you have a complex AI (like a self-driving car or medical scanner): Use the "Anti-Force" method (NegGrad+) combined with the "Detective" style architecture (Swin).
  • Don't trust the Highlighter: Be careful with methods that try to surgically remove memories; they might leave traces behind.

In short: We now know that the new, super-smart Vision Transformers can be taught to forget, and we have the right tools to do it safely, fairly, and effectively. This ensures that as AI becomes more powerful, it can also respect our privacy and remove its mistakes when asked.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →