Bias In, Bias Out? Finding Unbiased Subnetworks in Vanilla Models

Imagine you hire a brilliant but slightly biased chef to cook a meal for a diverse group of guests. This chef has been trained in a kitchen where 99% of the time, the ingredient "Tomato" was always served with "Basil," and "Potato" was always served with "Pepper."

Because of this, the chef learned a shortcut: "If I see a Tomato, I'll automatically add Basil. If I see a Potato, I'll add Pepper." They didn't actually learn why these flavors go together; they just memorized the pattern.

Now, imagine you ask this chef to cook for a new group of guests where the rules are different: sometimes "Tomato" goes with "Pepper," and "Potato" goes with "Basil." The chef, relying on their old shortcuts, will mess up the meal. They are biased by their training data.

This is exactly what happens in AI. Deep learning models often learn "shortcuts" (biases) instead of the real logic.

The Problem with Current Solutions

Usually, to fix a biased chef, you have two expensive options:

Re-train the chef: Send them back to school with a perfectly balanced menu of ingredients. This takes years and costs a fortune.
Rewrite the recipe book: Manually go through their thousands of notes and try to erase the bad habits. This is incredibly difficult and often breaks the good parts of their cooking.

The Paper's Big Idea: "BISE" (The Surgical Scalpel)

The authors of this paper, Bias-Invariant Subnetwork Extraction (BISE), ask a fascinating question: "Is it possible that inside this biased chef's brain, there is already a tiny, perfect, unbiased version of themselves waiting to be found?"

Their answer is yes.

They propose that you don't need to retrain the chef or rewrite the whole recipe. Instead, you just need to prune (cut away) the specific parts of the chef's brain that are obsessed with the shortcuts.

The Analogy: The Noisy Radio

Think of the trained AI model as a radio station broadcasting two signals at once:

The Good Signal: The actual truth about the world (e.g., "This is a cat because of its ears and whiskers").
The Bad Signal: The bias (e.g., "This is a cat because it's sitting on a red carpet, which is where cats usually sit in our training photos").

Right now, the radio is blasting both signals loudly. The "Bad Signal" is so loud that it drowns out the "Good Signal."

BISE is like a skilled sound engineer.
Instead of trying to record a new radio station from scratch (retraining), the engineer takes the existing radio, turns down the volume on the "Bad Signal" channel, and completely mutes the speakers that are only playing the noise.

What's left? A smaller, cleaner radio that only plays the "Good Signal."

How It Works (The Magic Trick)

Freeze the Chef: They don't touch the original weights (the chef's memory). They leave the chef exactly as they are.
Add a "Bias Detector": They attach a small, temporary assistant to the chef. This assistant's only job is to try to guess the bias (e.g., "Is this a red carpet?").
The Game of Hide and Seek:
- The main goal is to keep the chef good at identifying cats.
- The secondary goal is to make it impossible for the "Bias Detector" to guess the bias using the chef's brain.
- To achieve this, the system starts "pruning" (turning off) neurons. It asks: "If we turn off this specific neuron, does the chef still know it's a cat? If yes, but the Bias Detector can no longer guess the carpet color, then we cut that neuron out!"
The Result: They end up with a tiny, streamlined version of the original model. It's smaller, faster, and—most importantly—it ignores the shortcuts and focuses on the real features.

Why This Is a Game-Changer

No New Data Needed: You don't need a perfect, balanced dataset to fix the model. You can fix a biased model using the same biased data it was trained on.
It's Free (Computationally): You aren't retraining the whole thing from scratch. You are just "trimming the fat." This makes the model run faster and use less energy.
It Works: In their experiments, this "pruned" model actually performed better on fair, unbiased tests than the original giant model, and sometimes even better than other complex methods that required expensive retraining.

The Bottom Line

The paper proves that bias isn't always a permanent stain on the model. Sometimes, the "fair" version of the AI is hiding inside the "biased" version, just waiting for someone to cut away the noise.

Instead of throwing out the whole model and starting over, BISE acts like a surgeon, removing the specific "bad habits" to reveal the brilliant, unbiased logic that was there all along.

1. Problem Statement

Deep learning models often suffer from algorithmic bias, where they rely on spurious correlations (shortcuts) present in the training data rather than learning causal, task-relevant features. For example, a model might predict "hair color" based on "gender" because the training data contains a strong, non-causal correlation between the two.

Existing debiasing methods generally fall into two categories:

Data-centric: Rebalancing datasets or augmenting bias-conflicting samples (often requires access to unbiased data or synthetic generation, which is impractical).
Model-centric: Modifying training objectives (e.g., adversarial training, fairness constraints) or retraining the entire model. These are computationally expensive and often require full model retraining.

The Core Question: Can we extract a "fair" and unbiased subnetwork from a standard, vanilla-trained (biased) model without retraining the original parameters, without access to an unbiased training set, and without additional data manipulation?

2. Methodology: Bias-Invariant Subnetwork Extraction (BISE)

The authors propose BISE, a post-training strategy that identifies and isolates bias-robust subnetworks within a pre-trained dense model using structured pruning.

Key Components:

Setup:
- Input: A vanilla-trained model $f = C \circ E$ (Encoder + Classifier) trained on a biased dataset $D_{train}$ .
- Goal: Find a binary mask $M$ applied to the encoder's parameters such that the pruned subnetwork $f_M$ performs well on an unbiased test set $D_{test}$ while minimizing reliance on the bias attribute $b$ .
- Constraint: The original weights of $f$ remain frozen. Only the mask parameters are learned.
Learnable Pruning Mask:
- The method uses structured pruning (removing entire neurons/filters).
- Each structural component $i$ is associated with a learnable scalar parameter $m_i$ .
- A gating mechanism determines if a neuron is kept: $\hat{h}_i = h_i \cdot \mathbb{1}\{\sigma(m_i/\tau) \geq 0.5\}$ .
- A temperature parameter $\tau$ is annealed to zero during training to enforce a hard binary decision (prune or keep).
- A Straight-Through Estimator (STE) is used to allow gradient flow through the discrete step function.
Objective Function (Loss):
The mask parameters are trained to minimize a composite loss $J$ :
$J = L_r(\hat{y}, y) + \gamma I(\hat{b}, b)$
- Balanced Cross-Entropy ( $L_r$ ): To prevent the model from simply optimizing for the majority (bias-aligned) class, the cross-entropy loss is reweighted. Bias-conflicting samples are up-weighted, and bias-aligned samples are down-weighted based on their proportion $\rho$ in the training set.
- Mutual Information Regularization ( $I(\hat{b}, b)$ ): To explicitly remove bias information, an auxiliary classifier $C_{aux}$ is attached to the bottleneck (encoder output) to predict the bias label $b$ . The term $I(\hat{b}, b)$ estimates the mutual information between the predicted bias and the true bias. Minimizing this term forces the encoder to discard information related to $b$ .
- Training Loop: The auxiliary classifier $C_{aux}$ is periodically retrained to ensure it remains a strong predictor of $b$ , maintaining the validity of the upper bound on mutual information.
Post-Processing:
- Once the mask is learned, the subnetwork is extracted.
- Optionally, the extracted subnetwork can be finetuned on the biased dataset (using the reweighted loss) to further boost performance without altering the subnetwork's size.

3. Key Contributions

Paradigm Shift: Demonstrates that unbiased subnetworks can exist within biased, vanilla-trained models and can be extracted via pruning without retraining the original weights or requiring unbiased data.
Efficiency: The method is computationally efficient. It only trains a small number of auxiliary mask parameters (often <0.1% of total weights) rather than the entire network.
Dual Benefit: The resulting models are not only more fair (higher accuracy on unbiased test sets) but also smaller and faster due to structural pruning (reduced FLOPs and parameters).
Robustness: The approach works effectively across various datasets (image and text) and bias levels, including scenarios with multiple biases.

4. Experimental Results

The authors evaluated BISE on five benchmarks: BiasedMNIST, Corrupted-CIFAR10, CelebA, Multi-Color MNIST, and CivilComments.

Performance:
- Without Finetuning: BISE-extracted subnetworks consistently outperformed the vanilla dense models on unbiased test sets. For example, on BiasedMNIST ( $\rho=0.99$ ), BISE achieved 96.1% accuracy vs. 88.9% for the vanilla model.
- With Finetuning: Further finetuning the subnetwork pushed performance to State-of-the-Art (SOTA) levels, often surpassing complex debiasing methods like Group DRO or LfF.
- Comparison: BISE outperformed other pruning-based debiasing methods and was competitive with methods that require full retraining or unbiased datasets.
Efficiency (Sparsity & Complexity):
- The method significantly reduced model complexity.
- On BiasedMNIST, it achieved ~20-35% sparsity, reducing FLOPs from 415.4M to ~270-330M.
- On CelebA, it achieved ~67% sparsity, reducing FLOPs from 1818.6M to ~821M.
- Unlike other debiasing methods that keep the full model size, BISE produces a lighter model.
Ablation Studies:
- Removing the mutual information term ( $I(\hat{b}, b)$ ) resulted in lower sparsity, confirming its role in forcing the removal of bias-related features.
- The method showed low sensitivity to hyperparameters ( $E, \kappa, \nu, \tau_{min}$ ).
- Unsupervised Setting: BISE was also effective when bias labels were not available during training (using pseudo-labels from a secondary biased model), outperforming other unsupervised baselines.

5. Significance and Conclusion

Practicality: BISE addresses the "data scarcity" problem in debiasing. It does not require collecting new, unbiased datasets, which is often the bottleneck in real-world applications.
Cost-Effectiveness: By avoiding full model retraining and simultaneously reducing model size, BISE offers a highly efficient path to deploying fair AI, particularly for resource-constrained environments.
Theoretical Insight: The work provides empirical evidence that "bias-free" representations can coexist with "biased" ones within the same dense network parameters, supporting the "Lottery Ticket Hypothesis" in the context of fairness.
Limitations: The method relies on the existence of a robust unbiased substructure within the vanilla model. In cases of extreme spurious correlation (e.g., $\rho=0.999$ in BiasedMNIST), the vanilla model may fail to learn the core features entirely, limiting the subnetwork's performance without finetuning.

In summary, BISE offers a novel, efficient, and effective framework for algorithmic bias mitigation by treating debiasing as a structural extraction problem rather than a data or full-model optimization problem.

Bias In, Bias Out? Finding Unbiased Subnetworks in Vanilla Models

The Problem with Current Solutions

The Paper's Big Idea: "BISE" (The Surgical Scalpel)

The Analogy: The Noisy Radio

How It Works (The Magic Trick)

Why This Is a Game-Changer

The Bottom Line

1. Problem Statement

2. Methodology: Bias-Invariant Subnetwork Extraction (BISE)

Key Components:

3. Key Contributions

4. Experimental Results

5. Significance and Conclusion

More like this

Exploring AI in Fashion: A Review of Aesthetics, Personalization, Virtual Try-On, and Forecasting

Rule Extraction in Machine Learning: Chat Incremental Pattern Constructor

Inverse classification with logistic and softmax classifiers: efficient optimization

BarcodeBERT: Transformers for Biodiversity Analysis

On Minimal Depth in Neural Networks