Unlearning Evaluation through Subset Statistical Independence

The Big Problem: The "Eraser" Test

Imagine you have a student who has memorized a textbook. One day, they are asked to "unlearn" a specific chapter (maybe because that chapter contained a mistake or the author wants their work removed).

The student claims, "I have successfully erased that chapter from my mind."

How do you test if they really did?

The Old Way (Retraining): You make the student start over from scratch, but this time, you don't give them the chapter they were supposed to forget. Then, you compare their new answers to their old answers. If they match, the "unlearning" worked.
- The Flaw: This is like asking the student to re-take the whole course just to prove they forgot one page. It's expensive, slow, and defeats the purpose of having a quick "eraser."
The New Way (The Paper's Idea): You don't need to retrain the student. You just need to look at how they answer questions about that specific chapter right now.

The Core Idea: The "Group Hug" vs. The "Stranger"

The authors propose a clever trick based on how human brains (and AI brains) work.

1. The "Group Hug" (In-Training Data)
When a model (or student) learns a set of data together, the data points don't just sit there; they influence each other. They form a "group hug."

Analogy: Imagine a group of friends who went on a road trip together. They share inside jokes, they know how the others think, and their memories are intertwined. If you ask two random friends from that trip about the journey, their answers will be statistically linked because they experienced the same events together.
In AI terms: If a group of images was used to train the model, the model's internal "thoughts" (activations) about those images are dependent on each other. They are statistically connected.

2. The "Stranger" (Out-of-Training Data)
Now, imagine a group of people who never went on that road trip.

Analogy: If you ask two random strangers about a trip they never took, their answers will be completely independent. There is no shared history, no inside jokes, and no statistical link between their responses.
In AI terms: If a group of images was never seen by the model, the model's "thoughts" about them are independent. They are just random guesses based on general knowledge.

The Solution: The "Split-Half" Test (SDE)

The paper introduces a method called Split-half Dependence Evaluation (SDE). Here is how it works, step-by-step:

Pick a Suspect Group: You have a group of data (a subset) that the model is supposed to have forgotten.
Split the Group: You cut this group in half, like splitting a deck of cards into two piles (Pile A and Pile B).
The "HSIC" Test: You use a mathematical tool called HSIC (Hilbert-Schmidt Independence Criterion). Think of HSIC as a statistical lie detector.
- It asks: "How much do the answers from Pile A depend on the answers from Pile B?"
The Verdict:
- If the model still remembers the data: Pile A and Pile B will still be "hugging" each other. They will show a strong statistical connection. The lie detector says: "Dependent! This data was in the training set."
- If the model successfully forgot the data: Pile A and Pile B will act like strangers. There will be no connection. The lie detector says: "Independent! This data was never seen."

Why is this better?

The paper argues that previous methods were like trying to catch a thief by asking them to reenact the crime scene perfectly (retraining) or by hiring a private investigator to guess if they were there (Membership Inference Attacks).

The new method is like checking the thief's fingerprint.

No Retraining Needed: You don't need to rebuild the model.
No Extra Classifiers: You don't need to train a second "attacker" model to catch the first one.
Group Focus: Instead of checking one single photo (which is hard to prove), you check a whole group. If the group acts like strangers, the whole group has been successfully forgotten.

The Results: Catching the Liars

The authors tested this on several "unlearning" algorithms (different ways to try to erase data).

The "Unroll" Method: This method claimed to be very good at unlearning. It looked perfect on traditional tests (like accuracy).
The SDE Verdict: The SDE test looked at the "Group Hug" and said, "Wait a minute! These data points are still hugging each other! You didn't actually forget them!"
The Result: The SDE method revealed that some popular unlearning methods were actually failing, even though they looked successful on paper.

Summary

Think of this paper as a new forensic tool for AI privacy.

Old way: "Prove you forgot by re-learning everything without that info." (Hard and slow).
New way: "Show me the data you forgot. If the model's reaction to that data looks like it's reacting to strangers (independent), then you successfully forgot it. If it looks like it's reacting to old friends (dependent), you're still remembering."

This allows companies and regulators to verify if AI models are truly respecting the "Right to be Forgotten" without needing to rebuild the entire system from scratch.

1. Problem Statement

Machine unlearning aims to remove the influence of specific training data (the "forgetting set") from a trained model, a requirement driven by privacy regulations (e.g., GDPR's "right to be forgotten") and security needs (e.g., removing backdoors). However, evaluating whether unlearning has been successful remains a significant challenge.

Existing evaluation methods suffer from critical limitations in realistic deployment scenarios:

Retraining Dependency: The gold standard involves comparing the unlearned model to a model retrained from scratch on the remaining data. This is computationally prohibitive and defeats the purpose of efficient unlearning.
Membership Inference Attacks (MIA): Current MIA-based evaluations rely on per-sample cues (confidence scores, loss values) or auxiliary classifiers trained on shadow models. These require access to internal training statistics, labels, or hyperparameters that are often unavailable post-hoc. Furthermore, they struggle with statistical power when the forgetting set is small (5–20% of data) and the subset loses co-adaptation with the rest of the data.

The core problem is the lack of a standalone, retraining-free, and label-independent method to verify if a specific subset of data has been effectively "forgotten" by a model.

2. Methodology: Split-half Dependence Evaluation (SDE)

The authors propose SDE, a framework that evaluates unlearning effectiveness by testing for statistical independence among model outputs on a candidate forgetting subset.

Core Intuition

In-Training Data: When a model is trained on a dataset, the parameters are updated based on shared gradient information. Consequently, the representations (activations) of samples from the training set become statistically dependent on one another due to this shared "influence footprint."
Out-of-Training Data: Samples never seen during training do not contribute to parameter updates. Their activations should remain statistically independent of each other relative to the model's learned parameters.
Unlearning: A successful unlearning process should break the statistical dependence between the forgetting set and the model, making the forgetting set's activations behave like out-of-training data (i.e., statistically independent).

The Algorithm

Split-Half Test: Given a target subset $S$ (e.g., the forgetting set), it is randomly split into two equal halves, $S_1$ and $S_2$ .
Activation Extraction: The model $h$ processes both halves to obtain activations (typically from the penultimate layer, $h_p$ ).
HSIC Calculation: The Hilbert–Schmidt Independence Criterion (HSIC) is computed between the activations of $S_1$ $S_{1}$ and $S_2$ $S_{2}$ . HSIC is a kernel-based measure of statistical dependence; a value near zero implies independence, while higher values imply dependence.
- To estimate the distribution robustly, the split is shuffled multiple times (e.g., 200 permutations) to generate a distribution of HSIC values.
Comparison with References:
- Reference Sets: Small, known in-training ( $S_{IT}$ ) and out-of-training ( $S_{OOT}$ ) subsets are used as baselines.
- Decision Rule: The HSIC distribution of the target subset $S$ is compared to the reference distributions using Jensen–Shannon Divergence (JSD).
- If $S$ is closer to $S_{OOT}$ (low dependence), unlearning is deemed successful. If $S$ is closer to $S_{IT}$ (high dependence), the data is still "remembered."

3. Key Contributions

Novel Evaluation Paradigm: Shifts from sample-wise MIA to subset-level statistical independence, aligning better with the reality that unlearning typically targets small subsets of data.
Retraining-Free & Label-Free: The method requires no retrained reference model, no auxiliary classifier training, and no access to training labels or internal loss distributions. It operates solely on the model's output representations.
Theoretical Foundation: Provides a theoretical analysis showing that training induces a shared influence component ( $\Delta \theta_S$ ) in the parameters, which creates non-zero cross-covariance between split halves of in-training data, whereas out-of-training data lacks this component.
Robustness: Demonstrates effectiveness across different model architectures (AllCNN, ResNet-18), datasets (SVHN, CIFAR, Tiny-ImageNet), and even generative models (Diffusion models).

4. Experimental Results

The authors conducted extensive experiments on classification and diffusion models:

Distinguishing In/Out-of-Training:
- On retrained models, SDE achieved F1 scores > 0.9 in distinguishing in-training vs. out-of-training subsets, even with small forgetting ratios (5%) and small subset sizes (400 samples).
- The method showed high robustness across different network layers, performing best on deeper layers (penultimate/final) but remaining effective on intermediate layers.
- It remained effective even when models were only partially trained (e.g., 20% of epochs), though performance improved with full training.
Comparison with Baselines:
- vs. Distribution Metrics: SDE significantly outperformed Maximum Mean Discrepancy (MMD) and Wasserstein distance, which rely on marginal distribution shifts and are more sensitive to subset size.
- vs. Existing Unlearning Evaluations:
  - Case Study (Unroll Algorithm): Traditional metrics (accuracy, ASR) suggested the "Unroll" algorithm was effective (similar to retrained models). However, SDE revealed an Out-of-Training Rate (OTR) of ~3%, indicating that Unroll failed to remove the influence of the forgetting data.
  - Case Study (Random-Label): SDE correctly identified Random-Label as highly effective (OTR ~84%), aligning with its theoretical design.
  - Case Study (Sparsity/SalUn): SDE provided clearer differentiation between these methods than ASR alone, which often yielded ambiguous results.
Generative Models: The method was successfully applied to Elucidated Diffusion Models (EDM), achieving high F1 scores (>0.8) in detecting forgotten data in high-dimensional feature spaces.

5. Significance and Impact

Practical Deployment: SDE offers a practical tool for auditors and model owners to verify unlearning compliance without the computational cost of retraining or the privacy risks of exposing training labels.
Correcting Misleading Metrics: The paper highlights that existing metrics (like ASR and accuracy) can be misleading, potentially overestimating the effectiveness of unlearning algorithms. SDE provides a more rigorous, statistically grounded signal.
Scalability: By operating on subsets rather than individual samples and avoiding auxiliary training, the method scales efficiently to large models and datasets.
Future Direction: The work suggests a shift in the field toward evaluating the structural independence of data representations rather than just predictive performance or membership leakage, offering a new standard for privacy-preserving machine learning.

In conclusion, SDE establishes a new, robust, and standalone benchmark for machine unlearning evaluation, proving that statistical dependence analysis is a superior indicator of "forgetting" compared to current state-of-the-art methods.

Unlearning Evaluation through Subset Statistical Independence

The Big Problem: The "Eraser" Test

The Core Idea: The "Group Hug" vs. The "Stranger"

The Solution: The "Split-Half" Test (SDE)

Why is this better?

The Results: Catching the Liars

Summary

1. Problem Statement

2. Methodology: Split-half Dependence Evaluation (SDE)

Core Intuition

The Algorithm

3. Key Contributions

4. Experimental Results

5. Significance and Impact

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank