Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning

Imagine you have a very smart, highly trained medical assistant named Dr. AI. Dr. AI has read millions of patient notes and is excellent at diagnosing diseases like pneumonia, diabetes, and heart attacks.

However, a problem arises. A hospital realizes that the data Dr. AI used to learn about "Heart Attacks" was messy, unreliable, or perhaps the hospital just wants to stop offering that specific service. They need Dr. AI to completely forget how to diagnose heart attacks, but they want Dr. AI to stay just as good at diagnosing everything else.

The Old Way: The "Total Reset"

Traditionally, if you wanted to remove a specific skill from a smart AI, you'd have to wipe its memory and start over. You'd take all the training data, throw out the bad examples, and retrain the whole system from scratch.

The Analogy: It's like firing a brilliant chef, throwing away their entire cookbook, and hiring a new chef to learn the whole menu again from zero. It takes forever, costs a fortune, and you lose all the progress you made.

The New Way: STEU (The "Surgical Edit")

The paper introduces a new method called STEU (Sparse Token Embedding Unlearning). Instead of firing the chef and rewriting the whole book, STEU performs a tiny, precise surgery.

Here is how it works, broken down into simple steps:

1. Finding the "Trigger Words"

Dr. AI doesn't just "know" heart attacks; it knows them because it recognizes specific words and phrases in patient notes, like "chest pain," "cardiac," or "heart attack."

The Analogy: Imagine Dr. AI's brain is a giant library. The "Heart Attack" knowledge isn't stored in one big room; it's scattered in thousands of books. But, there are a few key index cards that point directly to those books.
The STEU Move: STEU uses a smart search tool (called PMI) to find exactly which index cards (specific words) are most responsible for the "Heart Attack" diagnosis. It ignores words like "patient" or "hospital" because those are used for everything, not just heart attacks.

2. The "Freeze" and the "Edit"

Once STEU finds those key index cards, it does two things:

Freezes the Library: It puts a "Do Not Touch" sign on 99.8% of the library (the deep layers of the AI that understand complex grammar and context). This ensures the AI doesn't forget how to speak or diagnose pneumonia.
Erases the Cards: It only rewrites the few specific index cards it found earlier. It changes them so that when the AI sees "chest pain," it no longer thinks, "Oh, that's a heart attack!" Instead, it thinks, "I don't know what that means anymore."

3. The "Head" Adjustment

Sometimes, just changing the cards isn't enough. The AI might still try to guess based on the context. So, STEU also tweaks the very last step of the decision-making process (the "classifier head").

The Analogy: Think of the AI as a judge. Even if the evidence (the index cards) is changed, the judge might still have a habit of ruling "Guilty." STEU gently reminds the judge, "Hey, for this specific case, please rule 'Not Guilty' and ignore the old habit."

Why is this a Big Deal?

The paper tested this on real hospital data (MIMIC-IV, MIMIC-III, eICU) and found amazing results:

Total Amnesia for the Target: Dr. AI stopped diagnosing heart attacks almost perfectly (Forget F1 score was nearly 0).
No Collateral Damage: Dr. AI was still just as good at diagnosing pneumonia and diabetes. The "Retain" score stayed high.
Tiny Effort: The old methods required changing 19.6% of the AI's brain (millions of parameters). STEU only changed 0.19% (a tiny fraction).

The Bottom Line

STEU is like using a scalpel instead of a sledgehammer.

If you need to remove a specific memory or skill from a massive AI model, you don't need to rebuild the whole thing. You just need to find the few specific "words" or "signals" that trigger that memory, erase those specific signals, and gently adjust the final decision. It's fast, cheap, and keeps the rest of the AI's intelligence perfectly intact.

This is crucial for hospitals and companies that need to follow privacy laws or change their policies without having to spend months retraining their expensive AI systems.

1. Problem Statement

The paper addresses the challenge of Machine Unlearning in clinical Natural Language Processing (NLP). Large Language Models (LLMs) and transformer-based classifiers are increasingly used to analyze Electronic Health Records (EHRs) for tasks like disease prediction. However, privacy regulations and institutional policies often require the removal of specific sensitive information (e.g., data from a specific hospital, specific patient cohorts, or unreliable diagnostic labels) from deployed models.

The Core Conflict: The standard solution—retraining the model from scratch without the target data—is computationally prohibitive and often infeasible when data is distributed or inaccessible.
The Specific Goal: The authors focus on behavioral class-level unlearning. Instead of removing individual data points (sample-level), the goal is to suppress the model's ability to predict a specific disease category (e.g., "Myocardial Infarction") while maintaining high performance on all other disease categories.
Constraints: The solution must minimize parameter modifications to ensure auditability and avoid violating deployment constraints, while preserving the utility of the model on retained tasks.

2. Methodology: Sparse Token Embedding Unlearning (STEU)

The authors propose STEU, a parameter-efficient method that targets the embedding layer rather than the deeper encoder layers. The core hypothesis is that clinical concepts are often driven by a sparse set of lexical signals; therefore, modifying the embeddings of these specific tokens is sufficient to alter downstream predictions without retraining the entire encoder.

Key Components:

Token Selection via Pointwise Mutual Information (PMI):
- STEU identifies a small subset of tokens strongly associated with the "forget class" (the disease to be removed).
- It calculates a score for each token $t$ :
  $\text{score}(t) = \log_2 \left( \frac{P(t | \text{forget})}{P(t | \text{all})} \right) \times \log(1 + n_f(t))$
- The first term (PMI) measures class specificity, while the second term (frequency weight) ensures rare tokens are not selected. The top- $k$ tokens form the set $S$ .
Constrained Update Surface:
- Frozen Encoder: All transformer encoder layers (attention and feed-forward blocks) and position/segment embeddings remain completely frozen.
- Trainable Parameters: Only two components are updated:
  1. Selected Token Embeddings: Only the rows in the embedding matrix $E$ corresponding to the selected token IDs in $S$ are updated.
  2. Classifier Head: The final linear layer ( $W_c$ ) and its bias are updated to remap the perturbed representations to a suppressed output for the forget class.
Training Objective:
The loss function combines two terms:
- Forget Loss ( $L_{forget}$ ): Direct suppression of the target class by forcing its target label to zero (Binary Cross-Entropy with zero labels).
- Utility Loss ( $L_{utility}$ ): Anchoring the model to the baseline on "retain" data. It minimizes the divergence between the updated model's predictions and the original baseline's predictions for all non-target classes.
- Gradient Masking: During backpropagation, a mask ensures gradients are zeroed out for all embedding rows not in $S$ .

Why Embeddings + Head?

The authors argue that updating only the classifier head is insufficient because the encoder still generates strong hidden representations for the target concept. Conversely, updating only embeddings is limited because the frozen head cannot fully remap the new representations. Combining both allows for input-level signal weakening and output-level decision boundary adjustment with minimal parameters.

3. Key Contributions

Embedding-Layer Unlearning Framework: STEU introduces a novel intervention locus (the embedding layer) for unlearning, shifting away from the standard encoder-level updates.
Parameter Efficiency: The method modifies an extremely small fraction of the model (approx. 0.19%), making it highly auditable and suitable for constrained deployment environments.
Systematic Evaluation: The approach is rigorously tested across three clinical datasets (MIMIC-IV, MIMIC-III, eICU) and three transformer architectures (BioClinicalBERT, BERT-base, DistilBERT).

4. Experimental Results

The experiments demonstrate that STEU achieves near-complete forgetting with negligible impact on retained utility.

Performance Metrics (MIMIC-IV with BioClinicalBERT):
- Forget F1: Reduced from a baseline of ~0.47 to 0.0004 (near-zero predictive behavior).
- Retain F1: Maintained at 0.4766, showing minimal degradation compared to the baseline.
- Parameter Budget: Only 0.19% of model parameters (201K out of 108M) were updated.
Comparison with Baselines:
- Traditional encoder-level methods (Gradient Ascent, Direct Suppression) achieved similar forgetting (F1 $\approx$ 0.0000) but required updating 19.6% of the model parameters (21.3M parameters).
- STEU achieves comparable forgetting with two orders of magnitude fewer parameter updates.
Ablation Studies:
- Token Budget: Increasing the token count beyond $k=256$ yielded diminishing returns, confirming that high-PMI tokens capture the necessary signal.
- Head Update: Removing the classifier head update resulted in a Forget F1 of 0.2763, proving the head is critical for complete suppression.
Robustness: STEU consistently performed well across different datasets (MIMIC-III, eICU) and model sizes (including the smaller DistilBERT), with retained utility loss typically under 0.01.

5. Significance and Conclusion

Paradigm Shift: The paper challenges the assumption that unlearning requires deep encoder modifications. It demonstrates that for clinical text classification, the predictive signal for specific diseases is largely localized to a sparse set of token embeddings.
Practical Deployment: By limiting edits to <0.2% of parameters, STEU offers a practical solution for hospitals and institutions needing to comply with "Right to be Forgotten" regulations or correct labeling errors without the cost of full retraining.
Limitations & Future Work: The authors note that STEU provides behavioral unlearning, not certified sample-level removal (mathematical guarantees of zero influence). Future work aims to extend this to generative models, incorporate multi-seed experiments, and explore stronger privacy guarantees.

In summary, STEU proves that targeted, sparse edits to token embeddings, combined with a lightweight classifier head, offer a highly efficient and effective mechanism for class-level unlearning in clinical AI systems.