Sharpness-Aware Machine Unlearning

Imagine you have a very smart, over-achieving student named DeepNet. DeepNet has read a massive library of books (the training data) and memorized almost everything. But now, a few of those books contain false information, or perhaps the author of one book wants to be completely erased from history due to privacy laws.

The problem? DeepNet is so good at memorizing that if you just tell him, "Forget that one book," he gets confused. He tries to unlearn it, but in doing so, he accidentally starts forgetting the other good books too, or he gets so confused that he stops learning correctly.

This paper is about a new, smarter way to help DeepNet "unlearn" specific information without ruining his overall intelligence.

Here is the breakdown using simple analogies:

1. The Problem: The "Confused Student"

Usually, when we want a model to forget something, we try to push it in the opposite direction.

The Old Way (SGD): Imagine trying to erase a drawing on a whiteboard by scrubbing it with a sponge. If you scrub too hard, you might wipe away the nice drawing next to it. If you scrub too gently, the bad drawing stays. It's a delicate, messy balance.
The Conflict: The model is receiving two signals at once: "Remember this!" (Retain) and "Forget that!" (Forget). These signals fight each other, like a tug-of-war, often canceling each other out.

2. The Hero: SAM (The "Flat-Land Explorer")

The paper introduces a technique called Sharpness-Aware Minimization (SAM).

The Analogy: Imagine the model's knowledge is a landscape of hills and valleys.
- Sharp peaks are dangerous: If the model sits on a sharp peak, a tiny breeze (a small change in data) knocks it off, and it forgets everything. This is "overfitting" or memorizing noise.
- Flat valleys are safe: If the model sits in a wide, flat valley, it can wobble a bit without falling off. It generalizes well.
SAM's Superpower: Normally, SAM is great at finding these flat valleys. It helps the model ignore random noise (like a typo in a book) so it learns the real story.

3. The Big Discovery: "The Double-Edged Sword"

The authors found something surprising. When they asked SAM to unlearn specific data (the "Forget" set), SAM's behavior changed.

The Twist: To forget the "bad" data, SAM had to stop being so careful. It actually started overfitting to the data it was supposed to forget, just like the old method (SGD) did.
Why is this good? It sounds bad, but think of it this way: To erase a specific stain from a shirt, you sometimes need to scrub really hard right at that spot. SAM realized that to truly forget a specific sample, it needs to "overfit" to the act of forgetting it.

4. The New Strategy: "Sharp MinMax" (The Two-Brain Approach)

Since the authors realized that "overfitting" is actually helpful when you want to erase something specific, they invented a new algorithm called Sharp MinMax.

The Metaphor: Imagine DeepNet splits into two personalities:
1. The Wise Librarian (Retain Model): This part uses SAM to stay in the "flat valley." It carefully preserves all the good knowledge, ensuring the model stays smart and accurate on the remaining data.
2. The Eraser (Forget Model): This part is told to do the exact opposite. It climbs the "sharp peaks." It aggressively overfits to the data it needs to forget, essentially memorizing the "forget" command so hard that the original data is completely wiped out.

By splitting the model, they stop the signals from fighting each other. The Librarian keeps the house tidy, while the Eraser smashes the specific vase they need to get rid of, without breaking the furniture.

5. The Results: A Cleaner, Safer Model

The experiments showed that this new approach is a game-changer:

Better Privacy: It's much harder for hackers to guess if a specific person's data was in the training set (a "Membership Inference Attack"). The data is truly gone.
Less Confusion: The "forget" data and "remember" data are less tangled together in the model's brain.
Efficiency: They can forget difficult data (data the model really memorized) much faster and more effectively than before.

Summary

Think of this paper as teaching a student how to selectively amnesia. Instead of trying to gently nudge the student to forget, they realized that sometimes you need to split the student's brain: one half stays calm and wise to keep the good memories, while the other half goes into a frenzy to aggressively destroy the specific bad memories. This results in a smarter, safer, and more reliable AI.

Here is a detailed technical summary of the paper "Sharpness-Aware Machine Unlearning" (ICLR 2026).

1. Problem Statement

Machine unlearning aims to remove the influence of specific training data (the "forget set" $F$ ) from a trained model without retraining from scratch, while preserving performance on the remaining data (the "retain set" $R$ ).

Core Challenge: Unlearning involves conflicting objectives: gradient descent on $R$ (to retain knowledge) and gradient ascent (or randomization) on $F$ (to forget). These signals often interfere, leading to catastrophic forgetting of $R$ or ineffective forgetting of $F$ .
The Gap: While Sharpness-Aware Minimization (SAM) is known to improve generalization by preventing the memorization of noisy labels, its behavior under the specific dynamics of machine unlearning (where "noise" is intentionally injected via the forget set) was previously unknown.
Key Question: Can SAM's noise-suppression properties be leveraged to improve unlearning, or does the conflicting objective break its denoising mechanism?

2. Methodology & Theoretical Framework

The authors propose a Signal-to-Noise Decomposition framework to analyze unlearning dynamics in two-layer ReLU CNNs. They model inputs as a mix of a universal signal vector $\phi$ and noise vectors $\xi$ .

A. Theoretical Analysis of SAM under NegGrad

The paper analyzes SAM when applied to NegGrad (a standard unlearning method using gradient ascent on $F$ and descent on $R$ ).

The "Denoising Shutdown" Phenomenon: The authors prove that while SAM successfully suppresses noise memorization on the retain set $R$ $R$ , it abandons this denoising property when fitting the forget set $F$ $F$ .
- Mechanism: Under NegGrad, the perturbation term in SAM ( $\delta$ ) interacts differently with $R$ and $F$ . For $F$ , the gradient ascent objective forces the perturbation to align with the noise, effectively deactivating SAM's noise-canceling mechanism.
- Result: SAM overfits to the forget set almost as much as standard SGD, but it maintains superior generalization on the retain set.
Signal Surplus & $\alpha$ Thresholding:
- The paper derives the minimum weight $\alpha$ (balancing retain vs. forget objectives) required to prevent catastrophic forgetting.
- Key Finding: SAM requires a significantly smaller $\alpha$ (i.e., it can tolerate a stronger forgetting signal) than SGD to maintain retain accuracy.
- Theoretical Bound: The gap in required $\alpha$ between SGD and SAM scales as $O(\sqrt{d/n})$ , where $d$ is model dimension and $n$ is dataset size. SAM learns signal coefficients ( $\kappa$ ) faster, creating a "signal surplus" that protects the retain set.

B. Proposed Algorithm: Sharp MinMax

Motivated by the finding that SAM's tendency to overfit $F$ (when forced) can actually be beneficial for removing specific samples, the authors propose Sharp MinMax.

Concept: Decouple the model into two parts:
1. Retain Model ( $W_R$ ): Trained with SAM on $R$ to ensure flat minima and high generalization.
2. Forget Model ( $W_F$ ): Trained with Sharpness Maximization on $F$ .
Mechanism: By intentionally maximizing sharpness (overfitting) on the forget set, the model aggressively erases the specific patterns of $F$ . This is achieved via weight masking based on gradient magnitudes, splitting parameters into those updated by SAM and those updated by sharpness maximization.
Goal: Achieve "benign overfitting" on $F$ (complete removal) while maintaining "benign generalization" on $R$ .

3. Key Contributions

Theoretical Characterization: First rigorous proof that SAM's noise suppression "shuts off" on forget data during unlearning, causing it to behave like SGD regarding overfitting to $F$ , while retaining its denoising benefits on $R$ .
Optimization Guidelines: Derivation of provable bounds for the retain/forget weighting factor $\alpha$ , showing SAM allows for more aggressive forgetting without sacrificing retain accuracy compared to SGD.
Novel Algorithm (Sharp MinMax): A new unlearning framework that splits the model to simultaneously optimize for flatness (on $R$ ) and sharpness (on $F$ ), achieving state-of-the-art performance.
Empirical Validation: Extensive experiments demonstrating that SAM enhances existing unlearning methods (NegGrad, RL, SalUn, SCRUB) and that Sharp MinMax outperforms all baselines, particularly on high-memorization (hard-to-forget) datasets.

4. Experimental Results

Experiments were conducted on CIFAR-100 and ImageNet-1K using ResNet-50 and ViT-Small architectures.

Performance Metrics:
- ToW (Tug-of-War): A composite metric balancing retain accuracy, forget accuracy, and test accuracy.
- MIA (Membership Inference Attack): Lower correctness indicates better unlearning (privacy).
- Entanglement: Measured via Wasserstein distance to quantify the separation between retain and forget feature distributions.
Key Findings:
- SAM Enhancement: Adding SAM (or ASAM) to existing unlearning methods consistently improves ToW scores across all datasets and forget set difficulties (High, Mid, Low memorization).
- Sharp MinMax Superiority: Sharp MinMax achieved the highest ToW scores (e.g., >90% on CIFAR-100 for many settings), significantly outperforming NegGrad and SalUn, especially on High Memorization forget sets.
- Privacy: SAM-enhanced models showed lower MIA correctness, indicating stronger resistance to membership inference attacks.
- Geometry:
  - Loss Landscape: SAM-unlearned models maintained flatter loss basins compared to SGD.
  - Feature Entanglement: SAM and Sharp MinMax resulted in lower entanglement between $R$ and $F$ features, effectively separating the two distributions in the embedding space.
- Robustness: SAM-enhanced models were more resilient to Relearning Attacks (where an adversary tries to re-learn the forget set).

5. Significance and Implications

Reframing Overfitting: The paper challenges the conventional wisdom that overfitting is always detrimental. It demonstrates that controlled overfitting (via sharpness maximization) on specific samples is a powerful tool for strict unlearning requirements (e.g., GDPR "right to be forgotten").
Theoretical Foundation: Provides a signal-to-noise theoretical basis for understanding why certain optimizers work better for unlearning, moving beyond heuristic hyperparameter tuning.
Practical Impact: Offers a plug-and-play solution (Sharp MinMax) that improves the efficacy of current unlearning methods, making machine unlearning more viable for large-scale, privacy-sensitive applications.
Generalizability: The findings hold across different architectures (CNNs, Transformers), optimizers (SGD, Adam), and data conditions (noisy/structured noise).

In summary, the paper establishes that Sharpness-Aaware Minimization is a superior optimizer for machine unlearning due to its ability to decouple the learning dynamics of retain and forget sets, leading to the proposal of Sharp MinMax, which achieves state-of-the-art unlearning performance by strategically leveraging both flat and sharp minima.