Certifying the Right to Be Forgotten: Primal-Dual Optimization for Sample and Label Unlearning in Vertical Federated Learning

Imagine you are part of a massive, collaborative cooking club. Everyone brings a different ingredient to the table to create a giant, delicious stew (the AI model).

Party A brings the spices.
Party B brings the vegetables.
Party C (the chef) brings the meat and the final seasoning.

Together, you make a stew that tastes amazing. But then, someone says, "Hey, I don't want my specific batch of carrots in that stew anymore. Please remove them." Or maybe, "I never want any carrots in this recipe again."

This is the "Right to be Forgotten." In the world of Artificial Intelligence, if you want your data removed, the model shouldn't just ignore your data; it should act as if it never knew you existed.

The Problem: The "Re-cook" Dilemma

In a normal kitchen, if you want to remove an ingredient, you have two bad options:

The "Start Over" Method: Throw away the whole pot and start cooking from scratch with the remaining ingredients. This is perfect (the stew tastes exactly right without the carrots), but it takes forever and wastes a ton of energy.
The "Scrape It Out" Method: Try to fish the carrots out of the finished stew. This is fast, but you might accidentally rip out the meat or leave chunks of carrot behind. The stew might taste weird or be unsafe.

In Vertical Federated Learning (VFL), this is even harder because the ingredients are split up. You can't just reach into the pot; you have to coordinate with everyone holding a different part of the recipe. Existing methods often try to "scrape" the data out by aggressively pushing the model to forget, which often breaks the model (making it forget too much or become unstable).

The Solution: FedORA (The Smart Chef)

The authors propose a new method called FedORA. Think of FedORA not as a chef trying to fish out carrots, but as a smart, mathematical recipe adjustment system.

Here is how FedORA works, using simple analogies:

1. The "Confusion" Strategy (Primal-Dual Optimization)

Most old methods try to make the model hate the data it needs to forget. They tell the model, "If you see a carrot, scream 'WRONG!'" This is like trying to unlearn a song by playing it backwards at full volume. It often causes the model to get confused about everything else, ruining the stew.

FedORA's approach is different: It tells the model, "If you see a carrot, just be confused."

Instead of forcing a wrong answer, FedORA encourages the model to say, "I have no idea what this is."
The Analogy: Imagine a student taking a test. Instead of forcing them to write the wrong answer for a specific question (which might mess up their confidence on other questions), you tell them to leave it blank or guess randomly. The goal is uncertainty, not error. This removes the "memory" of the carrot without shaking the foundation of the whole recipe.

2. The "Tension" Meter (Dual Variables)

FedORA uses a mathematical "tension meter."

If the model is still remembering the carrot too well, the tension meter goes up, and the system applies more pressure to make it forget.
If the model is already confused enough, the tension meter goes down, and the system stops pushing.
The Analogy: It's like a thermostat. If the room is too hot (remembering too much), the AC turns on. If it's cool enough, the AC turns off. This ensures the model forgets just enough without freezing the whole house (ruining the model's performance on the other ingredients).

3. The "Smart Batch" Cooking (Asymmetric Design)

When you are cooking a huge pot of stew, you don't need to taste-test every single drop of the remaining soup to make sure it still tastes good. You only need to taste a few spoonfuls.

FedORA's Trick: It processes the "to-be-removed" carrots (the bad data) with full attention, checking every single one. But for the rest of the good ingredients, it only samples a small portion to check the flavor.
The Result: This saves a massive amount of time and energy (computing power) because the system isn't wasting resources re-tasting the whole pot.

Why This Matters

It's Fast: It doesn't require starting over from scratch.
It's Safe: It doesn't break the model's ability to recognize other things (like potatoes or beef).
It's Certified: The math proves that the result is almost as good as if you had started from scratch, but much faster.

The Bottom Line

FedORA is like a magical, efficient kitchen assistant that can remove specific ingredients from a complex, collaborative recipe without ruining the taste of the final dish. It does this by teaching the model to be politely confused about the unwanted data, rather than aggressively fighting it, and by only checking the "good" parts of the recipe just enough to keep things running smoothly.

This ensures that in our digital world, when you ask to be forgotten, the AI actually forgets you, without forgetting how to be helpful to everyone else.

Here is a detailed technical summary of the paper "Certifying the Right to be Forgotten: Primal-Dual Optimization for Sample and Label Unlearning in Vertical Federated Learning."

1. Problem Statement

The paper addresses the critical challenge of Federated Unlearning within Vertical Federated Learning (VFL) environments.

Context: In VFL, multiple parties hold different feature subsets of the same set of users (samples), while one party holds the labels. This distributed architecture creates interdependencies that make removing specific data influences difficult.
The Challenge: Existing unlearning methods, largely designed for Horizontal FL (HFL), struggle in VFL.
- Sample Unlearning: Removing specific data points requires coordinating across parties holding complementary features.
- Label Unlearning: Removing entire classes requires eliminating all associated samples across the distributed system.
- Limitations of Current Methods:
  - Retrain (Train-from-scratch): Computationally and communicationally prohibitive for large-scale VFL.
  - Gradient Ascent: Often leads to "catastrophic forgetting" (destabilizing the model) or excessive forgetting that degrades utility on remaining data. It lacks a formal mechanism to certify that unlearning has occurred without retraining.

2. Methodology: FedORA

The authors propose FedORA (Federated Optimization for data Removal via primal-dual Algorithm), a novel framework that formulates unlearning as a constrained optimization problem solved via a Primal-Dual Hybrid Gradient (PDHG) approach.

Core Components:

Primal-Dual Formulation:
- Objective: Minimize the loss on remaining data ( $L_r$ ) while ensuring the loss on unlearning data ( $L_u$ ) exceeds a threshold $\gamma$ (forcing the model to "forget").
- Constraint: $L_u(\Theta) \geq \gamma$ .
- Lagrangian: The problem is transformed into a saddle-point problem using a dual variable $\Omega$ (acting as a certificate of unlearning strength).
- Updates: The algorithm alternates between updating primal variables (model parameters $\Theta$ ) to preserve utility and dual variables ( $\Omega$ ) to enforce the unlearning constraint.
Uncertainty-Based Unlearning Loss:
- Unlike gradient ascent which forces misclassification (often causing instability), FedORA encourages classification uncertainty.
- The loss function maximizes the Entropy of the predicted distribution for unlearning samples while minimizing the KL-Divergence from a uniform distribution.
- Goal: Make the model output a uniform probability distribution (maximum uncertainty) for target samples, effectively "forgetting" them without forcing them into wrong classes.
Adaptive Step Size Mechanism:
- To ensure stable convergence in the primal-dual framework, FedORA dynamically adjusts the step sizes ( $\tau$ for primal, $\sigma$ for dual).
- The adjustment is based on the relative change in parameter norms between iterations. If updates are small, step sizes increase; if unstable, they decrease.
Asymmetric Batch Design:
- Unlearning Data: Processed as a full batch to ensure complete removal of influence.
- Remaining Data: Processed as mini-batches (sampling a ratio $\delta$ ). Since remaining data has already influenced the model during initial training, full reprocessing is unnecessary. This significantly reduces computational and communication overhead.

3. Key Contributions

First Primal-Dual VFL Unlearning Method: FedORA is the first approach to apply primal-dual optimization specifically for sample and label unlearning in VFL, providing a mathematically rigorous framework.
Novel Loss Function: Introduction of an uncertainty-based loss that avoids the instability of gradient ascent by promoting uniform prediction distributions rather than misclassification.
Theoretical Guarantees: The authors prove that the difference between the FedORA model and a "Train-from-scratch" model is bounded. This provides a theoretical certification that FedORA achieves unlearning effectiveness comparable to retraining.
Efficiency Innovations: The combination of adaptive step sizes and asymmetric batching reduces computational costs while maintaining model stability.

4. Experimental Results

The authors evaluated FedORA on tabular (Income) and image datasets (MedMNIST, CIFAR-10/100, Tiny-ImageNet) under various unlearning scenarios.

Utility Preservation: FedORA achieves test accuracy on remaining data comparable to Retrain (the gold standard), significantly outperforming Gradient Ascent (GA), ICO, and CVFU, especially on complex datasets.
Unlearning Effectiveness:
- FedORA achieves lower accuracy on unlearning data (better forgetting) than Retrain in many scenarios.
- Membership Inference Attacks (MIA): FedORA reduces the attack success rate (ASR) to near 50% (random guessing), indicating the model cannot distinguish unlearned samples from unseen data.
- Backdoor Attacks: FedORA effectively removes backdoor triggers, with ASR dropping to near 0%.
Efficiency:
- Runtime: FedORA is significantly faster than Retrain and comparable to or faster than other approximate methods (like GA) while offering better stability.
- Scalability: The asymmetric batch design allows FedORA to process only a fraction (e.g., 5-25%) of the remaining data, achieving a 4.5x speedup compared to processing the full dataset, with negligible loss in utility.
Robustness: The method remains effective under non-IID data distributions and when combined with Differential Privacy (Gaussian noise injection).

5. Significance

Regulatory Compliance: FedORA provides a practical solution for the "Right to be Forgotten" (GDPR) in complex VFL architectures where data is siloed across parties.
Theoretical Certification: By bounding the difference between unlearning and retraining, FedORA moves beyond heuristic approaches, offering mathematical guarantees of unlearning effectiveness.
Scalability: The asymmetric batch strategy makes unlearning feasible for large-scale VFL systems where retraining is impossible due to resource constraints.
Stability: The shift from gradient ascent to uncertainty-based optimization solves the common issue of model collapse or excessive forgetting in unlearning tasks.

In conclusion, FedORA represents a significant advancement in federated learning privacy, offering a theoretically grounded, efficient, and robust method to remove data influence in vertical federated settings without compromising model utility.