FairQuant: Fairness-Aware Mixed-Precision Quantization for Medical Image Classification

Imagine you are a doctor trying to diagnose skin conditions using an AI assistant. This AI is incredibly smart, but it's also a "glutton" for computer memory and battery power. To make it run on a small, portable device (like a tablet in a rural clinic), we need to shrink it down. This process is called quantization—basically, compressing the AI's brain by reducing the precision of its numbers, much like turning a high-definition photo into a smaller, lower-resolution JPEG to save space.

However, there's a catch. When we compress these models too much (like turning them into very low-resolution images), they often start making mistakes. And here's the scary part: they don't make mistakes equally. They might work great for light-skinned patients but fail miserably for dark-skinned patients. In the medical world, this isn't just a bug; it's a safety hazard.

The paper you shared introduces FairQuant, a new method to shrink these AI models without leaving certain groups of people behind. Here is how it works, explained through simple analogies:

1. The Problem: The "One-Size-Fits-All" Suit

Imagine you have a tailor making suits for a diverse group of people.

Standard Compression (Uniform Quantization): The tailor decides, "I'll make everyone's suit out of thin, cheap fabric to save money."
- Result: The tall, strong people (the majority group) might still fit okay, but the smaller or more delicate people (minority groups) might get squeezed, trip, or fall apart because the fabric isn't strong enough for their specific needs.
The Goal: We want to save money (reduce model size) but ensure the suit fits everyone comfortably, especially the most vulnerable.

2. The Solution: FairQuant (The Smart Tailor)

FairQuant is like a super-smart tailor who doesn't just cut the same fabric for everyone. Instead, they use a two-step strategy:

Step A: The "Sensitivity Map" (Listening to the Groups)

Before cutting the fabric, the tailor asks the group: "Who needs the strongest fabric?"

In the AI world, the system runs a quick test to see which parts of the AI's brain are critical for diagnosing dark skin versus light skin.
It creates a heat map. Some parts of the AI are "unimportant" (like the lining of a sleeve) and can be made of cheap, thin fabric (low precision). Other parts are "critical" (like the shoulder seam for a specific group) and need thick, strong fabric (high precision).
The Innovation: Unlike old methods that just looked at the "average" patient, FairQuant specifically listens to the "worst-served" groups to ensure they get the protection they need.

Step B: The "Learnable Bit-Width" (The Flexible Suit)

Once the tailor knows where to put the strong fabric, they don't just stick to a rigid plan. They use a special technique called BAQ (Bit-Aware Quantization).

Imagine the tailor is wearing a suit where the fabric thickness can change while they are sewing.
As the AI learns, it constantly adjusts: "Hmm, this specific part of the brain is struggling with dark skin; let's make this section slightly thicker (more bits). That other part is easy; let's make it thinner."
The AI optimizes itself to be as small as possible (saving money) while keeping the "worst-case" group just as safe as the "best-case" group.

3. The Results: Saving the Day

The researchers tested this on real medical datasets (photos of skin conditions) using different types of AI brains (ResNet and Transformers).

The Old Way: When they compressed the AI to 4 bits (very small), the model crashed for some groups. For example, on one model, accuracy for certain groups dropped from 50% to a pathetic 3%. It was useless.
The FairQuant Way: Using their method, they achieved a similar small size (around 4 bits) but kept the accuracy high (around 45-50%) for everyone.
The Analogy: It's like shrinking a heavy backpack to fit in a small pocket. The old way made the backpack so small it fell apart and lost your keys. FairQuant shrinks the backpack but reinforces the pockets where you keep your keys, so nothing gets lost, and it still fits in your pocket.

Why This Matters

In the real world, medical AI is often deployed in places with limited computing power (like a doctor's bag in a developing country). We need these models to be small and fast. But if a small model is biased against a specific group of people, it's dangerous.

FairQuant proves that you don't have to choose between efficiency (small size) and fairness (working for everyone). By being smart about where you cut corners, you can have a tiny, fast AI that treats every patient with the same level of care.

In short: FairQuant is the "smart compression" that ensures the AI doesn't forget the people who need it the most, even when it's running on a tiny, low-power device.

1. Problem Statement

Deep neural networks (DNNs) are increasingly used in medical image analysis, but their deployment in clinical settings is often constrained by latency, memory, and energy budgets. Quantization (reducing weight precision from 32-bit floating point to lower-bit integers) is a standard solution for compression. However, existing quantization methods (both Post-Training Quantization and Quantization-Aware Training) focus primarily on maintaining average accuracy.

The critical gap identified is algorithmic fairness. Medical datasets often under-represent specific demographic subgroups (e.g., darker skin tones or specific sexes). Standard low-precision quantization (e.g., Uniform 4-bit) often causes a catastrophic collapse in performance for these minority groups, even if the average accuracy remains acceptable. The authors ask: Can we design a mixed-precision quantization scheme that explicitly optimizes for fairness (worst-group performance) under strict bit budgets without manual per-model tuning?

2. Methodology: The FairQuant Framework

FairQuant is a framework that integrates group-aware sensitivity analysis with a learnable mixed-precision allocation strategy. It operates in two main phases:

A. Group-Sensitive Importance Analysis (Calibration)

Before training the quantized model, a calibration stage computes an importance map for each network layer, conditioned on sensitive groups (e.g., skin type or sex).

Process: The model is frozen, and data is processed in mini-batches. A group-restricted loss is calculated for each sensitive group $g$ .
Metric: Using a first-order Taylor expansion approximation, the importance score $I_{l,g}$ for a layer $l$ and group $g$ is calculated based on the squared gradient of the loss with respect to the weights.
Aggregation: These group-specific tensors are aggregated into a single importance map per layer. This map reflects both the overall sensitivity of the weights and the disparity in sensitivity across different groups.

B. Budgeted Mixed-Precision Allocation

The framework maps the calculated importance scores to discrete bit-widths (e.g., 2, 4, 8, 16 bits) under a global bit budget.

Static Allocation: Importance scores are sorted, and thresholds are set based on target proportions (e.g., 20% of layers get 2 bits, 40% get 4 bits, etc.). This creates a fixed "warm start" pattern.

C. Bit-Aware Quantization (BAQ)

The core innovation is BAQ, which treats bit-widths as learnable parameters rather than fixed assignments.

Learnable Proxies: Each layer scope $S$ has a real-valued logit $blogit_S$ . This is mapped to a continuous bit proxy $bcont_S$ via a $\tanh$ function, which is then rounded to the nearest integer $b_S$ for the forward pass.
Joint Optimization: The model is trained to minimize a composite loss function:
$L = L_{task} + \lambda_{fair} L_{fair} + \lambda_{baq,b} L_{baq,b}$
- $L_{task}$ : Standard classification loss (Cross-Entropy).
- $L_{fair}$ : The difference between the maximum and minimum group losses ( $\max_g L_g - \min_g L_g$ ). Minimizing this forces the model to reduce performance gaps between groups.
- $L_{baq,b}$ : An $L_2$ regularization on the bit logits to encourage lower bit-widths (bitrate control).
Gradient Flow: Gradients flow through the Straight-Through Estimator (STE) to update both the weights and the bit-width proxies simultaneously.

3. Key Contributions

FairQuant Framework: A novel approach that couples group-conditioned importance analysis with a budgeted mixed-precision allocation rule, specifically designed for medical imaging.
Bit-Aware Quantization (BAQ): A learnable mechanism that optimizes bit-widths jointly with weights under explicit fairness and bitrate constraints, avoiding the need for manual heuristics.
Group-Aware Sensitivity: The introduction of a calibration step that generates importance maps based on subgroup-specific gradients, ensuring that sensitive groups influence the precision allocation.
Comprehensive Evaluation: Extensive testing across diverse architectures (ResNet18/50, DeiT-Tiny, TinyViT) and datasets (Fitzpatrick17k, ISIC2019).

4. Experimental Results

The authors evaluated FairQuant on two dermatology datasets with sensitive attributes (Fitzpatrick skin type and patient sex).

Performance Recovery: In low-bit regimes (average 4–6 bits), Uniform 4-bit quantization often caused severe accuracy drops, particularly for worst-case groups (e.g., TinyViT on Fitzpatrick17k dropped to ~3% accuracy with Uniform 4-bit).
FairQuant Superiority:
- FQ-BAQ (Learnable) and FQ-QAT (Static) consistently recovered performance close to Full Precision (FP32) and Uniform 8-bit baselines.
- Worst-Group Accuracy: FairQuant significantly improved worst-group accuracy compared to Uniform 4-bit baselines. For example, on Fitzpatrick17k with ResNet18, Uniform 4-bit achieved 19.0% worst-group accuracy, while FQ-BAQ achieved 41.53%.
- Fairness Metrics: FairQuant maintained or improved Equalized Odds (EOdd) and Equal Opportunity (EOpp) gaps compared to uniform baselines under the same bit budget.
Stability: Ablation studies showed that the method is robust to variations in the bitrate regularizer ( $\lambda_{baq,b}$ ) and learning rates, requiring minimal hyperparameter tuning.

5. Significance and Conclusion

The paper demonstrates that precision allocation is not merely a technical compression task but a critical factor in algorithmic fairness.

Clinical Impact: In medical AI, a model that fails for a specific demographic (e.g., darker skin tones) is dangerous. FairQuant provides a pathway to deploy highly compressed, efficient models that do not sacrifice reliability for minority groups.
Efficiency vs. Fairness Trade-off: The study shows that one can achieve a "sweet spot" (4–6 bits) where models are highly efficient but retain the fairness and accuracy of 8-bit or full-precision models.
Generalizability: The approach is model-agnostic, working effectively on both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs).

In summary, FairQuant offers a practical solution for deploying equitable, resource-efficient medical AI systems by explicitly optimizing the quantization process to protect vulnerable subgroups.