Go Beyond Your Means: Unlearning with Per-Sample Gradient Orthogonalization

This paper introduces OrthoGrad, a novel machine unlearning method that effectively removes the influence of specific data by projecting unlearn gradients onto the subspace orthogonal to retain gradients, thereby mitigating interference and outperforming existing approaches even when only a small portion of the training data is available.

Aviv Shamsian, Eitan Shaar, Aviv Navon, Gal Chechik, Ethan Fetaya

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

🧠 The Big Problem: The "Forgetful" AI

Imagine you have a brilliant student (an AI model) who has read the entire internet. They are incredibly smart, but they've also memorized some things they shouldn't have:

  • Private photos of people who asked to be forgotten.
  • Copyrighted code snippets they stole from GitHub.
  • A specific person's voice that they shouldn't be able to recognize.

You want the student to forget these specific things. But here's the catch: You can't just erase their brain. If you try to scrub out the bad memories, you might accidentally wipe out their ability to do math, write poetry, or recognize other voices.

This is the challenge of Machine Unlearning: How do you make an AI forget specific data without ruining its general smarts?

🚫 The Old Way: The "Tug-of-War"

Most previous methods tried to fix this by playing a game of Tug-of-War.

  • Team Forget: They pull the model in one direction to make it forget the bad data (Gradient Ascent).
  • Team Remember: They pull the model in the opposite direction to keep it good at everything else (Gradient Descent).

The Flaw: This only works if you have a huge team of "Rememberers" (a massive dataset of the original training data) to balance out the "Forgetting."

  • The Reality: Often, the company that trained the AI doesn't have the original data anymore (maybe it was deleted, or it's too big to store). They only have a tiny scrap of data (a small "retain set") to help the model remember.
  • The Result: With a tiny team of Rememberers, the Tug-of-War fails. The model either forgets everything (including the good stuff) or remembers the bad stuff.

💡 The New Solution: OrthoGrad (The "Sideways Step")

The authors propose a new method called OrthoGrad. Instead of fighting the "Forget" force with a "Remember" force, they change the geometry of the problem.

The Analogy: The Dance Floor

Imagine the AI's knowledge is a giant dance floor.

  • The Bad Data (the thing to forget) is a group of dancers trying to pull the main dancer (the AI) toward the North.
  • The Good Data (the tiny scrap of retained data) is a small group trying to keep the dancer from moving too far East.

Old Method (Tug-of-War): The small group tries to pull East while the big group pulls North. The dancer gets stuck in the middle, or the small group gets dragged away.

OrthoGrad Method (The Sideways Step):
Instead of pulling East, the small group tells the dancer: "Don't worry about pulling us back. Just take a step North, but make sure you don't step East or West at all."

Mathematically, they project the "Forget" movement onto a path that is perfectly perpendicular (orthogonal) to the "Remember" movement.

  • They look at the tiny scrap of good data.
  • They calculate the exact direction that data cares about.
  • They force the "Forget" update to go in a direction that is 90 degrees to that.

Why this is magic: Because the update is perpendicular, it cannot accidentally mess up the good data. It's like walking down a hallway; you can walk forward (forgetting the bad thing) without bumping into the walls on your left and right (the good things).

🛠️ How It Works (The "Per-Sample" Secret Sauce)

The paper has a second clever trick.

  • Old methods looked at the "Average" of the good data. It's like asking a crowd, "What do you think?" and taking the middle answer. If the crowd is small, the average is shaky and unreliable.
  • OrthoGrad looks at every single person in that tiny crowd individually. It builds a "safety net" based on every single sample, not just the average.

The Analogy:
Imagine you are trying to walk through a forest without stepping on any flowers.

  • Average Method: You look at the forest from a helicopter, see a "general area" of flowers, and try to avoid that area. You might still step on a flower because the map was blurry.
  • OrthoGrad: You look at every single flower on the ground. You calculate a path that goes between every single one of them. Even if you only have 5 flowers to avoid, you can weave a perfect path through them without touching a petal.

🎤 Real-World Tests

The authors tested this on two very different things:

  1. Speech Recognition (Whisper): They made the AI forget a specific person's voice. Even with very little data to "remember" the rest of the world, OrthoGrad made the AI stop recognizing that one person while still understanding everyone else perfectly.
  2. Image Classification (ImageNet): They made the AI forget a whole category of images (like "dogs") or random pictures. OrthoGrad did a better job than all other methods at forgetting the target while keeping the rest of the brain sharp.

🏆 The Bottom Line

OrthoGrad is a new way to teach an AI to forget.

  • The Problem: You can't always retrain an AI from scratch, and you often don't have the original data to help it remember.
  • The Solution: Instead of fighting to keep the old knowledge, OrthoGrad takes a "sideways step." It updates the model in a direction that is mathematically guaranteed not to touch the knowledge you want to keep.
  • The Benefit: It works even when you have very little data to work with, making it perfect for real-world situations where privacy laws or data loss make the original training sets unavailable.

In short: It's the art of forgetting the bad stuff by walking a path that simply doesn't exist for the good stuff.