Erase at the Core: Representation Unlearning for Machine Unlearning

The Problem: The "Superficial Amnesia"

Imagine you hire a chef (the AI model) to cook a massive banquet using recipes from 1,000 different cultures. One day, a customer says, "I want to forget about the Italian recipes. Please remove all knowledge of pasta, pizza, and lasagna from your mind."

Most current methods for "unlearning" are like Superficial Amnesia.

What they do: They tell the chef, "If someone asks for pasta, just say 'I don't know' or give them a random soup."
The Result: The chef looks like they forgot. If you ask them directly, they fail the test.
The Catch: If you peek inside the chef's brain (the internal features), they still have the Italian recipes written on sticky notes in every drawer. They haven't actually deleted the knowledge; they just learned to hide it. If you give them a new prompt or a slightly different question, they can easily pull those Italian recipes back out.

The authors call this "Superficial Forgetting." The model passes the test, but the information is still lurking in the background, waiting to be recovered.

The Solution: "Erase at the Core" (EC)

The authors propose a new method called Erase at the Core (EC). Instead of just telling the chef to hide the answer, they want to burn the recipes from the inside out.

Here is how EC works, using a few analogies:

1. The Multi-Layer Scrubbing (The "Deep Clean")

Think of the AI model as a multi-story building.

The Old Way: Most unlearning methods only clean the lobby (the final output layer). They wipe the sign that says "Italian Food" off the door. But the kitchens on the 2nd, 3rd, and 4th floors are still full of Italian ingredients.
The EC Way: EC sends a cleaning crew to every single floor of the building. They go to the basement, the middle floors, and the top floor. They scrub the walls, wash the floors, and throw out the ingredients at every level. This ensures that the "Italian" concept is erased from the foundation up to the roof.

2. The "Confusion" Strategy (Contrastive Unlearning)

How do they actually erase the memory?

Imagine the "Italian" recipes are stored in a specific, neat row of filing cabinets.
EC takes those files and smashes them up, then scatters the pieces into the cabinets containing "French" and "Mexican" recipes.
It mixes the "forget" data so thoroughly with the "keep" data that the AI can no longer tell where the Italian recipes end and the others begin. The distinct shape of the Italian knowledge is dissolved.

3. The "Guardian" (Deep Supervision)

You might worry: "If I mix everything up, won't the chef forget how to cook anything?"

EC has a safety net. While it is smashing the Italian files, it has a Guardian watching the "French" and "Mexican" recipes.
The Guardian ensures that while the Italian files are being destroyed, the French and Mexican files remain perfectly organized and easy to find. This ensures the chef stays good at cooking the dishes they are supposed to keep.

Why This Matters

The paper shows that previous methods were like putting a blindfold on the chef. The chef couldn't say the Italian words, but they could still think them.

Erase at the Core removes the blindfold and actually deletes the thoughts.

Better Privacy: It makes it much harder for hackers to trick the AI into revealing the "forbidden" data (a technique called a "linear probing attack").
True Compliance: It actually fulfills the "Right to be Forgotten" laws (like GDPR) by ensuring the data is gone, not just hidden.
Plug-and-Play: The cool part is that EC isn't a whole new kitchen; it's a plug-in module. You can take any existing unlearning method and attach EC to it, instantly making it much better at actually deleting information.

The Bottom Line

If you want to truly forget something, you can't just stop talking about it. You have to rewire your brain so the memory doesn't exist anymore. Erase at the Core is the tool that does exactly that for AI, scrubbing the memory clean from the bottom up, ensuring that once data is deleted, it's really, truly gone.

1. Problem Statement: Superficial Forgetting

The paper addresses a critical limitation in current Machine Unlearning (MU) methods, termed "superficial forgetting."

The Issue: Most existing approximate unlearning methods successfully suppress the model's output logits for the "forget set" (achieving near-zero accuracy on forget classes). However, they fail to remove the underlying information from the model's internal feature representations.
The Consequence: Despite low output accuracy, intermediate layers of these models retain high similarity to the original model. This allows attackers to recover forget-set information via linear probing (freezing the backbone and retraining only the final classifier) or through representation-based metrics like Centered Kernel Alignment (CKA) and Information Difference Index (IDI).
The Gap: Current methods primarily operate on the final classifier or logits, leaving the deep feature hierarchy vulnerable to information leakage.

2. Methodology: Erase at the Core (EC)

The authors propose Erase at the Core (EC), a framework designed to enforce forgetting throughout the entire network hierarchy, not just at the output layer.

Core Architecture

Deep Supervision: EC attaches auxiliary modules (EC Modules) to intermediate layers of the backbone network (e.g., after specific stages in ResNet-50 or Swin-Tiny).
Module Design: Each EC module consists of a sequence of Convolutional blocks followed by a classifier. These modules are initialized using Supervised Contrastive Learning (SupCon) on the full dataset before unlearning begins.
Multi-Layer Supervision: Unlike standard unlearning which only updates the final layer, EC applies supervision at multiple points ( $L$ layers) throughout the network.

Loss Functions

At each supervision layer $l$ , EC optimizes a total loss composed of two complementary objectives:

Contrastive Unlearning Loss ( $L^{CU}_l$ ) on the Forget Set:
- This loss pushes the embeddings of forget samples ( $z^l_f$ ) away from their original class structure and diffuses them into the manifold of the retain set samples ( $z^l_r$ ).
- It maximizes the similarity between forget embeddings and retain embeddings, effectively erasing class-specific information at the feature level.
- Formula: $L^{CU}_l = -\frac{1}{|D_f||D_r|} \sum \log \text{sim}(z^l_i, z^l_j; \tau)$ , where $i \in D_f, j \in D_r$ .
Cross-Entropy Loss ( $L^{CE}_l$ ) on the Retain Set:
- Applied to retain samples to ensure the model maintains classification utility and does not degrade performance on the data it should keep.
- Formula: $L^{CE}_l = \text{CE}(g^l_\phi(a^l_\psi(h^l_\theta(x))), y)$ for $(x,y) \in D_r$ .

Weighted Aggregation

The total loss is a weighted sum across all layers:
$L_{total} = \sum_{l=1}^{L} w_l (\lambda_{CU} L^{CU}_l + \lambda_{CE} L^{CE}_l)$

Layer-wise Weights ( $w_l$ ): The authors assign progressively larger weights to deeper layers (e.g., $0.2, 0.4, 0.8, 1.0$). This is based on the intuition that deeper layers encode high-level, class-discriminative features, making them the most critical targets for erasure.

3. Key Contributions

Framework Proposal: Introduction of EC, the first framework to explicitly enforce representation-level forgetting across multiple intermediate layers using deep supervision and contrastive objectives.
Comprehensive Evaluation: A rigorous re-evaluation of existing unlearning baselines using both logit-based metrics (Forget Accuracy) and representation-based metrics (CKA, IDI, and k-NN downstream transfer). This exposes the "superficial forgetting" in state-of-the-art methods.
Plug-in Capability: Demonstration that EC is model-agnostic and can be integrated as a plug-in module into other unlearning algorithms (e.g., DUCK, COLA) to significantly boost their representation-level forgetting strength without sacrificing utility.
Robustness: Extensive experiments across diverse datasets (ImageNet-1K, CIFAR-100), architectures (ResNet-50, Swin-Tiny), and forgetting scenarios (random classes, top-similarity classes).

4. Experimental Results

The paper evaluates EC against strong baselines (PL, DUCK, SCAR, SCRUB, SalUn, DELETE, COLA, CU) on ImageNet-1K (100 classes forgotten) and CIFAR-100.

Representation Divergence: EC achieves the lowest CKA (Centered Kernel Alignment) and lowest |IDI| (Information Difference Index) among utility-preserving methods.
- On ImageNet-1K, EC reduces CKA to 38.68 (vs. 69.52 for the next best, CU), indicating a massive divergence from the original model's internal features.
- EC achieves an IDI of 0.051, approaching the "gold standard" retrained model (0.000), whereas other methods often remain above 0.4.
Utility Preservation: EC maintains high Retain Accuracy (RA) and Test Retain Accuracy (TRA), comparable to other strong baselines, proving that deep forgetting does not require sacrificing performance on retained data.
Linear Probing Resistance: Visualizations (t-SNE) and k-NN retrieval tests show that EC disrupts the linear separability of forget classes in intermediate layers, making it difficult to recover forget-set accuracy by retraining a classifier.
Plug-in Effectiveness: When applied to DUCK and COLA, the EC-augmented versions (DUCK+EC, COLA+EC) show significant improvements in CKA and IDI, confirming EC's ability to enhance existing methods.

5. Significance and Conclusion

Paradigm Shift: The paper argues that true machine unlearning must move beyond logit-level suppression to representation-level erasure. "Superficial forgetting" is insufficient for regulatory compliance (e.g., GDPR) as internal knowledge persists.
Practical Impact: EC provides a practical, model-agnostic solution that can be retrofitted into existing pipelines to ensure deeper, more robust forgetting.
Future Directions: While EC provides strong empirical evidence of erasure, the authors note that formal mathematical guarantees for erasure remain an open challenge. Future work may involve extending EC to more diverse architectures and integrating it with other unlearning paradigms.

In summary, Erase at the Core establishes that effective unlearning requires modifying the deep feature hierarchy. By applying contrastive unlearning and deep supervision across layers, EC successfully eliminates residual information in intermediate representations, setting a new standard for robust machine unlearning.