Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias

The Big Problem: The "Shortcut" Student

Imagine you are teaching a student (an AI model) to recognize Waterbirds.

The Hard Way: You show them pictures of birds, explaining their beaks, wings, and feathers. This takes time and effort, but they learn the real concept of a bird.
The Shortcut Way: You accidentally show them 995 pictures of waterbirds sitting on water, and only 5 pictures of waterbirds on land. The student gets lazy. They realize, "Hey, if there's water, it's a waterbird!" They stop looking at the bird and just look at the water.

In the AI world, this is called learning a shortcut. The model learns a "spurious correlation" (Water = Bird) instead of the real truth.

The New Problem: "Shortcut Unlearning"

Now, imagine you need to teach this student to forget the concept of "Waterbird" entirely (maybe due to privacy laws or because the data was biased). You tell the model: "Please forget everything about Waterbirds."

Here is the surprising twist the paper discovered:

What you expect: The model forgets the bird, and you are left with a model that doesn't know what a waterbird is.
What actually happens: The model forgets the water, but it remembers the bird.

Because the model learned the "Water = Bird" shortcut so easily, it's very good at that specific trick. When you ask it to forget, it panics and drops the easiest thing it learned (the water background) to satisfy the request. But because it still knows the bird features, it can still guess "Waterbird" correctly, just by looking at the bird itself!

The authors call this "Shortcut Unlearning." It's like asking someone to forget how to drive a car, and they decide to forget how to use the steering wheel (the easy part) but keep remembering how to press the gas pedal. They can still drive, just poorly.

The Solution: CUPID (The Surgical AI Surgeon)

To fix this, the authors created a new method called CUPID. Think of CUPID as a surgical team that performs a very precise operation on the AI's brain, rather than just smashing it with a hammer.

CUPID works in three steps:

1. The "Pain Scale" Check (Sharpness-Aware Partitioning)

First, CUPID asks the AI: "How hard was it for you to learn this specific picture?"

Easy Pictures (Flat): These are the "shortcut" pictures (Bird on water). The AI learned them instantly. In math terms, these sit in a "flat" valley of the learning landscape.
Hard Pictures (Sharp): These are the tricky pictures (Bird on land). The AI struggled to learn these. They sit in a "sharp," steep peak.

CUPID separates the pictures into two piles based on this "pain scale."

2. Mapping the Brain (Causal Pathway Identification)

Next, CUPID looks inside the AI's brain to see which neurons are responsible for which pile.

It finds the "Shortcut Pathway": The neurons that only care about the water background.
It finds the "Causal Pathway": The neurons that actually care about the bird's shape and feathers.

Usually, these pathways are tangled together like a mess of headphones. CUPID untangles them.

3. The Surgery (Targeted Pathway Update)

Finally, CUPID performs the surgery.

It tells the Shortcut Pathway: "You can relax, we don't need you."
It tells the Causal Pathway: "You need to forget the bird completely. We are erasing your memory of the bird shape."

By targeting the real memory (the bird) and ignoring the easy memory (the water), CUPID ensures the model truly forgets the class, rather than just changing its strategy.

Why This Matters

In the real world, AI models are often trained on messy, biased data (like the waterbird example). If we try to make them "forget" bad data using old methods, they might just learn a new, sneaky way to guess the answer, leaving the bias intact.

CUPID is the first method that realizes: "You can't just tell the AI to forget; you have to tell it what to forget."

The Results

The paper tested CUPID on three different "biased" datasets.

Old Methods: The AI forgot the background but kept the object. It was still "remembering" the class it was supposed to forget.
CUPID: The AI successfully erased the class. It couldn't guess the answer anymore, even when the shortcut was gone. It achieved the best results, proving that you can surgically remove bad information without breaking the rest of the model.

In short: If you want an AI to truly forget something, you have to stop it from taking the easy way out. CUPID forces the AI to face the hard truth and delete the real memory, not just the shortcut.

1. Problem Statement

The paper addresses a critical failure mode in Machine Unlearning when applied to models trained on biased datasets.

Context: Machine unlearning aims to remove the influence of specific data (e.g., a target class) from a pre-trained model to comply with privacy regulations (e.g., "Right to be Forgotten").
The Bias Challenge: In real-world scenarios, models often learn from datasets where class labels are highly correlated with spurious attributes (shortcuts). For example, a "waterbird" is almost always associated with a "water" background.
The Phenomenon ("Shortcut Unlearning"): The authors identify a paradox where models find it easy to learn bias-aligned samples (where the shortcut works) but hard to forget them.
- When instructed to forget a class, standard unlearning algorithms fail to erase the causal features of the class.
- Instead, they inadvertently erase the spurious shortcut (the bias).
- Paradoxical Result: This leads to a counter-intuitive improvement in accuracy on the very class intended to be forgotten (specifically on bias-conflicting samples), because the model stops relying on the shortcut that was misleading it. The model forgets the "shortcut" but retains the "class" via the shortcut's removal, failing the unlearning objective.

2. Methodology: CUPID Framework

To solve this, the authors propose CUPID (Causal Unlearning via Pathway Identification and Disentanglement). The framework is grounded in loss landscape geometry, positing that samples learned via shortcuts (easy) occupy flat regions of the loss landscape, while samples learned via causal features (hard) occupy sharp, high-curvature regions.

CUPID operates in three stages:

Stage 1: Sharpness-Aware Partitioning

Goal: Separate the "forget set" ( $D_f$ ) into subsets approximating bias-aligned (shortcut-based) and bias-conflicting (causal-based) samples without ground-truth labels.
Mechanism: The method calculates the local loss sharpness ( $\omega_{sharpness}$ $ω_{s ha r p n ess}$ ) for each sample. This is done by perturbing the model parameters in the direction of the gradient and measuring the resulting loss increase.
- Low Sharpness: Indicates the sample is in a flat region (likely learned via a robust shortcut).
- High Sharpness: Indicates the sample is in a sharp region (likely learned via causal features).
Action: Samples are partitioned into a bias-approximated set ( $D_{bias}$ ) and a causal-approximated set ( $D_{causal}$ ) based on a sharpness threshold.

Stage 2: Causal Pathway Identification

Goal: Disentangle the model's parameters into a causal pathway and a bias pathway.
Mechanism: The authors define a causal mask ( $m_c$ $m_{c}$ ) based on two factors:
1. Magnitude: The size of the parameter.
2. Curvature: The diagonal element of the Hessian matrix (second derivative of the loss).
Action: Parameters with high magnitude and high curvature (associated with the "hard" causal samples) are identified as the causal pathway. The remaining parameters form the bias pathway.

Stage 3: Targeted Pathway Update

Goal: Perform a "surgical" update that erases causal information while managing the bias pathway.
Mechanism:
- Compute the gradient for the causal set ( $g_{causal}$ ) and the total forget set ( $g_f$ ).
- Project the total gradient onto the causal direction to isolate the causal gradient ( $g_{proj}$ ). The orthogonal component is treated as the bias gradient ( $g_{bias}$ ).
- Apply updates selectively:
  - Causal Pathway: Updated using $g_{proj}$ , weighted by sample sharpness (giving more weight to "harder" samples).
  - Bias Pathway: Updated using $g_{bias}$ .
Result: This ensures the model specifically targets the causal features of the class to be forgotten, preventing the model from simply "unlearning the shortcut."

3. Key Contributions

Formalization of "Shortcut Unlearning": The paper identifies and defines a new failure mode where unlearning algorithms inadvertently remove bias attributes instead of the target class, leading to paradoxical accuracy improvements on the forgotten class.
CUPID Framework: A novel three-stage unlearning method that leverages loss landscape sharpness to partition data and disentangle neural pathways, enabling targeted updates without requiring access to the retain set.
Empirical Validation: Comprehensive experiments demonstrating that CUPID outperforms state-of-the-art baselines in both heavily biased training sets and unbiased test sets.

4. Experimental Results

The authors evaluated CUPID on three standard biased datasets: Waterbirds, BAR (Biased Action Recognition), and Biased NICO++.

Performance on Biased Training Sets:
- CUPID achieved the lowest Forget Accuracy (FA) across all datasets (e.g., 6.91% on Waterbirds vs. 34.96% for the next best method, NegGrad), closely approaching the theoretical gold standard of "Retrain from scratch."
- It achieved the smallest performance gap ( $\Delta gap$ ) between bias-aligned and bias-conflicting samples, proving balanced forgetting.
Generalization to Unbiased Test Sets:
- On unbiased test sets (50:50 ratio), CUPID maintained superior performance, significantly outperforming baselines which often failed to generalize their forgetting (large $\Delta gap$ ).
- CUPID effectively mitigated the shortcut unlearning problem, ensuring the model truly forgot the class rather than just switching reliance from shortcut to causal features.
Privacy: CUPID demonstrated strong privacy protection (low Membership Inference Attack scores), comparable to retraining.
Qualitative Analysis (Grad-CAM): Visualizations showed that while other methods continued to activate on spurious features (e.g., water backgrounds), CUPID successfully diverted attention away from these bias attributes, confirming the removal of the shortcut reliance.

5. Significance

Theoretical Insight: The paper challenges the assumption that target information is cleanly separable in biased models, revealing that "easy to learn" shortcuts create "hard to forget" dependencies.
Practical Impact: CUPID offers a robust solution for real-world AI systems where data bias is prevalent. It allows for effective data removal (compliance with GDPR/Right to be Forgotten) without compromising model utility or failing due to hidden correlations.
Methodological Advance: By utilizing loss landscape geometry (sharpness) to guide unlearning, the paper introduces a new paradigm for "surgical" model editing that goes beyond simple gradient inversion or relabeling.

In summary, this paper provides a critical correction to the field of machine unlearning, demonstrating that without accounting for data bias and shortcut learning, unlearning algorithms may fail to achieve their primary goal. CUPID offers a principled, geometry-based solution to ensure models truly forget the intended information.