Distribution-Aware Federated Learning for Diabetes Prediction Using Tabular Clinical Data Under Non-IID and Class-Imbalanced Settings

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to teach a group of five different doctors how to spot early signs of diabetes. You want them to work together to create one "Super Doctor" who is better than any of them working alone. However, there are two big problems:

The Privacy Wall: Doctors can't share their actual patient files because of strict privacy laws. They can only share their "lessons learned" (the math inside their brains), not the patient names or records.
The Messy Data: Each doctor sees a different type of patient.
- Doctor A works in a wealthy area where almost everyone is healthy. They rarely see diabetes.
- Doctor B works in a high-risk area and sees diabetes in almost every patient.
- Doctor C has a huge list of patients, but 99% are healthy.
- Doctor D has a small list, but half are diabetic.

The Old Way: "The Loud Majority" (FedAvg)

In the traditional method (called FedAvg), the central computer acts like a strict teacher who listens to everyone. But this teacher makes a huge mistake: they listen to the loudest voice, not the most important one.

Since Doctor C has the biggest list of patients, the teacher thinks, "Doctor C must be the expert!" and lets Doctor C's lessons dominate the "Super Doctor."

The Result: The Super Doctor becomes obsessed with saying "No one has diabetes!" because that's what the biggest group of patients looks like.
The Danger: The Super Doctor misses the actual diabetic patients (the minority) because the teacher ignored the smaller doctors who actually saw the disease. In medicine, missing a diabetic patient is dangerous; it's like a smoke detector that only beeps when the house is on fire, but stays silent when there's just a small spark.

The New Way: "The Balanced Classroom" (DA-FL)

The authors of this paper created a new method called Distribution-Aware Federated Learning (DA-FL). Think of this as a smarter teacher who understands that quantity doesn't equal quality when it comes to rare diseases.

Here is how DA-FL works, using a simple analogy:

1. The "Rare Disease" Bonus (Local Training)

Before the doctors even send their lessons to the central computer, the teacher gives them a special rule: "If you see a diabetic patient, pay extra attention to them!"

Normally, a doctor might ignore a diabetic patient because they are rare in their office.
With this rule, the doctor is forced to study that specific patient very hard. This ensures every doctor, even the one with mostly healthy patients, learns to spot the disease.

2. The "Weighted Vote" (Global Aggregation)

Now, the doctors send their lessons to the central computer to build the "Super Doctor."

The Old Teacher said: "Doctor C, you have 10,000 patients, so your vote counts 10,000 times."
The New Teacher (DA-FL) says: "Doctor C, you have 10,000 patients, but only 10 are diabetic. Your vote on diabetes is weak. Doctor B, you only have 100 patients, but 50 are diabetic. Your vote counts 50 times more!"

The new teacher calculates a "Minority Amplification Factor." It's like a volume knob.

If a doctor has very few diabetic patients, the teacher turns their volume down (so they don't drown out the others with "No diabetes" noise).
If a doctor has many diabetic patients, the teacher turns their volume up (so their experience with the disease is heard clearly).

Why This Matters

The paper tested this on a massive dataset of over 230,000 health records. Here is what happened:

Stability: The old method was like a rollercoaster. One day the Super Doctor was great, the next day it was useless. The new method (DA-FL) was like a smooth train ride—consistent and reliable every single time.
Accuracy: The new method was 31 times more stable than the old one.
Saving Lives: Most importantly, the new method was much better at catching the "minority" (the diabetic patients). It didn't just guess "healthy" to get a high score; it actually found the sick people.

The Bottom Line

This paper solves a problem where big data usually wins over rare data. In a world where we can't share private medical records, DA-FL ensures that the "Super Doctor" doesn't just become an expert on the majority. Instead, it becomes an expert on everyone, especially the vulnerable minority who need the most help.

It's like changing a classroom rule from "The student with the most homework gets the most credit" to "The student who solves the hardest problems gets the most credit." This way, the rare and difficult cases (like diabetes) get the attention they deserve.

1. Problem Statement

The paper addresses two critical, interrelated challenges in deploying Federated Learning (FL) for clinical diabetes prediction:

Statistical Heterogeneity (Non-IID Data): Patient data across different healthcare institutions varies significantly due to demographic differences, diagnostic equipment, and disease prevalence. Standard FL algorithms (like FedAvg) suffer from "client drift" under these conditions, leading to poor global model convergence.
Class Imbalance: In clinical datasets, diabetic cases (minority class) are vastly outnumbered by non-diabetic cases (majority class). In the CDC BRFSS 2021 dataset used, the imbalance ratio is approximately 6:1.
The Core Conflict: Conventional FL aggregation strategies (e.g., FedAvg) weight client updates solely based on dataset size ( $n_k/n$ ). This ignores local class distributions. Consequently, clients with large datasets but very few diabetic cases disproportionately bias the global model toward the majority class, resulting in poor sensitivity (Recall) for the minority class and degraded performance on metrics like F1-Macro and G-Mean.

2. Methodology: Distribution-Aware Federated Learning (DA-FL)

The authors propose DA-FL, a novel framework that implements a two-level correction mechanism to address imbalance without requiring raw data sharing or complex data augmentation.

A. Level 1: Local Training Correction (Client-Side)

To prevent local models from ignoring the minority class during training, DA-FL modifies the local loss function:

Class-Weighted Cross-Entropy: Each client $k$ calculates a local weight $\omega_k$ based on its local class distribution:
$\omega_k = \frac{n_k^{(0)}}{n_k^{(1)}}$
Where $n_k^{(0)}$ and $n_k^{(1)}$ are the counts of negative and positive samples, respectively.
Objective: The local loss penalizes misclassification of the minority class by a factor of $\omega_k$ , forcing the local model to maintain sensitivity to diabetic cases regardless of local prevalence.

B. Level 2: Global Aggregation Correction (Server-Side)

To correct the bias introduced by size-proportional aggregation, DA-FL introduces a Minority-Class Amplification Factor ( $\phi_k$ ):

Definition: $\phi_k$ is the ratio of the client's local positive rate ( $p_k$ ) to the global positive rate ( $\bar{p}$ ):
$\phi_k = \text{clip}\left(\frac{p_k}{\bar{p}}, \phi_{min}, \phi_{max}\right)$
Where $\phi_{min}=0.1$ and $\phi_{max}=5.0$ to prevent any single client from dominating.
Mechanism:
- Clients with a high local positive rate ( $p_k > \bar{p}$ ) receive $\phi_k > 1$ , amplifying their contribution to the global model.
- Clients with a low positive rate ( $p_k \ll \bar{p}$ ) receive $\phi_k \approx 0.1$ , suppressing their influence to prevent majority-class bias.
Privacy: Only the scalar positive rate ( $p_k$ ) is shared as metadata; no raw data or detailed histograms are transmitted.

C. System Architecture

Model: A Multi-Layer Perceptron (MLP) with 4 layers (Input: 21 features $\to$ 64 $\to$ 128 $\to$ 64 $\to$ Output).
Framework: Implemented using the Flower (flwr) library with 5 simulated clients.
Data: CDC BRFSS 2021 dataset (236,378 records, 21 features).

3. Key Contributions

Novel Aggregation Strategy: Introduction of DA-FL, which integrates local class distribution information into server-side aggregation weights via the $\phi_k$ factor, addressing imbalance at the federation level without modifying client data.
Two-Level Correction: Combining class-weighted local loss with distribution-aware global aggregation to tackle imbalance simultaneously at the training and aggregation stages.
Comprehensive Evaluation: Systematic testing on a large-scale real-world clinical dataset under three Non-IID severity levels (controlled by Dirichlet parameter $\alpha$ ).
Open Source: Release of a reproducible simulation framework for imbalanced federated clinical prediction.

4. Experimental Results

Experiments were conducted over 30 communication rounds comparing DA-FL against FedAvg, FedProx, Local Only, and Centralized Training.

A. Performance under Moderate Non-IID ( $\alpha = 0.5$ )

DA-FL significantly outperformed baselines on clinically critical metrics:

F1-Macro: Improved by 18.2% over FedAvg (0.4471 vs. 0.2650).
G-Mean: Improved by 26.7% over FedAvg (0.7329 vs. 0.4658).
Recall: Improved by 15.1% over FedAvg (0.7503 vs. 0.5997), indicating a substantial reduction in false negatives (missed diabetes cases).
Accuracy: Improved by 31.5% over FedAvg.

B. Training Stability

DA-FL demonstrated exceptional stability, a crucial factor for clinical deployment:

F1-Macro Stability: Standard deviation was 31 times lower than FedAvg (0.0046 vs. 0.1431).
Failure Modes: FedAvg and FedProx hit a G-Mean of 0.000 in their worst rounds (complete failure to detect the minority class), whereas DA-FL's worst G-Mean remained at 0.5633, ensuring consistent safety.
AUC-ROC Stability: DA-FL showed a 14-fold improvement in variance reduction compared to FedAvg.

C. Performance Across Non-IID Levels

Moderate/Mild ( $\alpha = 0.5, 1.0$ ): DA-FL was consistently superior across all metrics.
Extreme ( $\alpha = 0.1$ ): While FedProx slightly outperformed DA-FL in F1-Macro and Accuracy, DA-FL maintained the highest G-Mean (0.7020) and Recall (0.7067), preserving the most clinically vital metric (sensitivity to diabetic patients).

5. Significance and Conclusion

Clinical Relevance: The study demonstrates that DA-FL effectively mitigates the "majority class bias" inherent in standard FL, ensuring that models do not fail to detect diabetic patients—a critical safety requirement in healthcare.
Practical Deployability: The method is computationally lightweight ( $O(K)$ overhead), requires no data augmentation, and preserves privacy by only sharing scalar metadata.
Stability Guarantee: The extreme stability of DA-FL suggests it is suitable for real-world deployment where unpredictable model oscillations between rounds could pose patient safety risks.
Future Work: The authors suggest extending the method to multi-class classification, adaptive clipping for extreme distributions, and integration with differential privacy.

In summary, DA-FL provides a robust, privacy-preserving solution for federated clinical prediction, successfully balancing the trade-offs between statistical heterogeneity and class imbalance to produce reliable, high-sensitivity diagnostic models.