MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

Imagine you have a very smart, well-read librarian (the Large Language Model) who works for a private library (the Server). You (the Client) are a patron who once gave the librarian a diary to read. Now, you want the librarian to "unlearn" everything from that diary because it contains your private secrets.

Here is the problem:

You can't give the diary back: You don't want to hand over your physical diary to the librarian because it's too private.
The librarian can't show you the books: The librarian's entire collection of knowledge is their trade secret. They can't let you see their exact brain or their specific notes to verify they deleted your info.

This creates a standoff. How do you make the librarian forget your secrets without you showing them the secrets, and without them showing you their brain?

Enter MPU (Multiple Perturbed Copies Unlearning). Think of it as a clever magic trick involving clones, noise, and a special eraser.

The Magic Trick: How MPU Works

The paper proposes a three-step dance between you and the librarian to solve this privacy standoff.

Step 1: The "Blurred Photocopy" (Pre-Process)

Instead of the librarian showing you their exact brain, they create multiple copies of their mind. But before giving them to you, they do two things:

They add static: Imagine putting a layer of fuzzy, random static noise over the copies. This hides the librarian's exact secrets from you.
They rearrange the furniture: They shuffle the internal organization of the copies (like swapping the order of books on a shelf) in a way that doesn't change what the librarian knows, just how it's stored. This is called reparameterization.

Now, you have a few slightly "blurred" and "rearranged" versions of the librarian's mind. You can't see the original, and you can't reverse-engineer the original from these copies.

Step 2: The "Local Eraser" (Client-Side Unlearning)

You take these blurred copies and use your own private diary (the Forget Set) to "teach" them to forget.

You tell the copies: "Hey, erase everything that looks like my diary!"
The copies try to forget. Because they are slightly different from each other (due to the noise and rearranging), they might forget in slightly different ways.
You calculate the changes (the "updates") needed to make them forget and send those changes back to the librarian. You do not send your diary.

Step 3: The "Harmonic Magic Eraser" (Post-Process)

The librarian receives your changes. But wait! The changes are based on the "blurred" copies, not the real librarian. If the librarian just applied your changes, they might accidentally learn the wrong things or get confused by the noise.

Here is the genius part: The Librarian cancels out the noise.

Because the librarian created the copies with a specific mathematical pattern (the noise was designed to cancel itself out), they can mix all your different updates together.
Imagine you have three people trying to push a heavy box in slightly different directions. If you push them all together with the right math, the "wobbly" parts cancel out, and the box moves in the perfect, straight line you wanted.
The librarian uses a Harmonic Aggregation method to mix your updates. The "static noise" disappears, and the "rearranging" is reversed.
The result? The librarian updates their real brain as if you had given them the exact changes needed to forget your diary, but without ever seeing your diary or showing you their brain.

Why is this a big deal?

Privacy for Everyone: You keep your secrets; the company keeps their trade secrets.
It Actually Works: Usually, adding noise to protect privacy makes the AI stupid or bad at forgetting. But because MPU uses multiple copies and cancels the noise mathematically, the AI forgets just as well as if it had seen your data directly.
It's Flexible: It works with almost any method of "unlearning" the AI uses.

The Analogy Summary

Think of it like a group of artists trying to paint over a specific part of a masterpiece without ruining the rest.

The Gallery Owner (Server) gives the Artist (Client) three slightly different, blurry sketches of the painting.
The Artist paints over the unwanted part on all three sketches using their own secret reference photo (which they keep hidden).
The Artist sends the changes back to the Gallery Owner.
The Gallery Owner mixes the three changes together. Because the blurriness was added in a specific way, the blurriness cancels out when mixed, leaving a perfect, clean edit on the original masterpiece.

MPU is the mathematical recipe that makes this "cancellation" possible, ensuring that privacy doesn't come at the cost of performance.

1. Problem Statement

Large Language Models (LLMs) often memorize sensitive training data, necessitating Machine Unlearning to remove specific knowledge upon request. However, a critical privacy dilemma exists in real-world server-client deployments:

Client Constraint: The client holds the "forget set" (sensitive data to be removed) locally and cannot share raw data or fine-grained statistics with the server due to privacy regulations.
Server Constraint: The server owns the proprietary model and cannot reveal its exact current parameters to the client to prevent model theft or reverse engineering.

Existing unlearning methods fail in this dual non-disclosure setting. Training-time methods (like SISA) require specific data structures, while post-hoc methods typically assume the unlearning agent has full access to both the model and the data. Federated learning approaches often require reconstructing models from history or exposing parameters, failing to meet the strict confidentiality requirements of both parties.

2. Methodology: The MPU Framework

The authors propose MPU (Multiple Perturbed Copies Unlearning), an algorithm-agnostic framework designed to enable unlearning without direct data sharing or parameter exposure. The framework operates over $R$ communication rounds and consists of three stages:

A. Pre-Process: Perturbed Copies Generation (Server-Side)

Instead of sending the exact model $\theta$ , the server generates $m \ge 2$ perturbed copies to send to the client. This involves two mechanisms:

Structured Noise Injection: The server adds Gaussian noise to the model parameters. Crucially, the noise is generated with a zero-sum constraint across the $m$ copies. Specifically, for each parameter block, the sum of the noise vectors across all copies is zero ( $\sum \epsilon_k = 0$ ).
Invertible Reparameterization: To prevent the client from reconstructing the original parameters even with multiple noisy copies, the server applies a function-preserving reparameterization $T_k$ $T_{k}$ .
- This transformation is based on parameter symmetries in Transformer architectures (e.g., permuting hidden channels in Feed-Forward Networks or rotating basis vectors in Attention heads).
- It is invertible and function-preserving, meaning $f_{T(\theta)}(x) = f_\theta(x)$ . The model's output remains identical, but the parameter space is obscured.

B. Client-Side Local Unlearning

The client receives $m$ perturbed, reparameterized models. Using its private forget set, the client executes a standard unlearning algorithm (e.g., Gradient Ascent, NPO, DPO) on each copy locally. The client returns the update deltas ( $\Delta_k$ ) to the server. No raw data or original parameters are exchanged.

C. Post-Process: Update Aggregation (Server-Side)

The server reverses the reparameterization to map updates back to the original coordinate system. It then aggregates the $m$ updates using Harmonic Aggregation:
$\bar{\Delta} = \frac{\sum_{k=1}^m \alpha_k^{-1} \hat{\Delta}_k}{\sum_{k=1}^m \alpha_k^{-1}}$
where $\hat{\Delta}_k$ is the inverted update and $\alpha_k$ are scaling factors.

Theoretical Guarantee: Due to the zero-sum noise structure and the specific harmonic weights, the first-order error introduced by the noise is mathematically canceled out during aggregation.
The resulting update $\bar{\Delta}$ is consistent with the ideal noise-free unlearning step, effectively "denoising" the process while keeping the client's data and server's exact parameters hidden.

D. Memory Efficiency

MPU is designed to be memory-efficient. The server does not need to store all $m$ models or updates simultaneously. It can process copies in a streaming manner, accumulating sufficient statistics ( $S_0$ and $S_1$ ) to compute the final update, requiring only $O(d)$ memory rather than $O(md)$.

3. Key Contributions

Dual Non-Disclosure Framework: MPU is the first solution to enable unlearning where the client never shares data and the server never reveals exact parameters, without relying on surrogate data or auxiliary statistics.
Function-Preserving Reparameterization for LLMs: The authors generalize invertible reparameterizations to modern Transformer architectures, including RoPE-based models (e.g., Llama), ensuring functional equivalence while obfuscating parameters.
Theoretical Noise Cancellation: They provide proofs that structured noise injection combined with harmonic aggregation eliminates first-order noise errors, ensuring the aggregated update matches the noise-free baseline.
Empirical Validation: Extensive experiments demonstrate that MPU achieves performance comparable to (and sometimes better than) noise-free baselines across various unlearning algorithms and model sizes.

4. Experimental Results

The framework was evaluated on Llama-3.2 (1B, 3B) and Qwen2.5 (1.5B, 3B) models using the TOFU benchmark (Split99).

Performance vs. Baselines:
- Forget Quality (FQ): MPU consistently outperforms single-copy noisy baselines. For many algorithms (e.g., GradAscent, DPO), MPU significantly improves FQ compared to the noisy baseline, often matching the noise-free "Clean" baseline.
- Model Utility (MU): MPU preserves model utility on retained data almost perfectly, with degradation well below 1% in most cases, even under 10% noise levels.
- Privacy Leakage: Privacy leakage metrics remain low and comparable to noise-free baselines, indicating that the perturbation effectively hides the forget set without compromising the unlearning goal.
Robustness:
- Noise Level ( $\kappa$ ): MPU is robust to varying noise levels. Moderate noise can even act as a stabilizer for unstable unlearning algorithms (like GradAscent), improving performance.
- Copy Number ( $m$ ): While increasing $m$ can improve stability, $m=2$ is often sufficient. Increasing $m$ beyond 2 does not always yield better results and can sometimes destabilize specific algorithms.
- Model Scaling: The denoising effect becomes stronger as model size increases (e.g., 3B models show better performance than 1B models under MPU).

5. Significance

MPU addresses a fundamental bottleneck in the deployment of LLMs: the conflict between regulatory compliance (Right to be Forgotten) and intellectual property/privacy protection.

Practical Deployment: It enables a secure server-client architecture where sensitive user data never leaves the client device, and proprietary model weights are never exposed, facilitating the adoption of unlearning in commercial and regulated environments.
Theoretical Insight: The work bridges the gap between differential privacy concepts (noise injection) and model unlearning, demonstrating that structured noise can be neutralized to recover exact gradients, a novel contribution to the field of secure machine learning.
Generalizability: Being algorithm-agnostic, MPU can be integrated with any existing unlearning technique, making it a versatile tool for the broader LLM ecosystem.

In conclusion, MPU provides a rigorous, theoretically grounded, and empirically validated solution for secure knowledge unlearning, resolving the dual non-disclosure constraint that has previously hindered the practical application of machine unlearning in LLMs.

MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

The Magic Trick: How MPU Works

Step 1: The "Blurred Photocopy" (Pre-Process)

Step 2: The "Local Eraser" (Client-Side Unlearning)

Step 3: The "Harmonic Magic Eraser" (Post-Process)

Why is this a big deal?

The Analogy Summary

1. Problem Statement

2. Methodology: The MPU Framework

A. Pre-Process: Perturbed Copies Generation (Server-Side)

B. Client-Side Local Unlearning

C. Post-Process: Update Aggregation (Server-Side)

D. Memory Efficiency

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank