Federated Active Learning Under Extreme Non-IID and Global Class Imbalance

Imagine you are trying to teach a group of 100 different students (the clients) to recognize animals, but you can't put them all in one classroom. Instead, they are all in their own homes, and you can't see their homework or their personal photos because of strict privacy rules. This is Federated Learning.

Now, imagine you have a very limited budget for hiring a teacher to grade their work. You can't grade every single photo they have. You need to be smart about which photos you ask them to label. This is Active Learning.

When you combine these two ideas, you get Federated Active Learning (FAL). The goal is to pick the most helpful photos to label so the group learns fast, without ever sharing the actual photos.

The Problem: The "Unfair" Classroom

The paper points out a huge problem with how this usually works in the real world:

The "Long-Tail" Problem (Global Imbalance): Imagine that across all 100 students, there are 1,000 photos of dogs, but only 5 photos of rare animals like a "Platypus" or a "Sloth." Most students have mostly dogs. The rare animals are scattered sparsely. If you just ask for the "hardest" photos to label, you'll end up labeling 900 more dogs and maybe one sloth. The AI will become an expert at dogs but will never learn what a sloth looks like.
The "Different Worlds" Problem (Non-IID): Student A might only have photos of farm animals. Student B might only have zoo animals. Their data is totally different (heterogeneous).

Existing methods often get confused here. They either ask the "Group Teacher" (Global Model) or the "Individual Student" (Local Model) to pick the photos.

If the Group Teacher picks, they might miss the rare animals because they are too focused on the common ones.
If the Individual Student picks, they might only pick photos from their own small, weird collection, which doesn't help the whole group.

The Big Discovery

The authors ran a massive experiment and found a simple rule: The model that picks a more balanced mix of photos (including the rare ones) always wins.

They also found a "Goldilocks" rule for who should do the picking:

When the whole group is super unbalanced (lots of dogs, few sloths) but everyone's data is similar, the Group Teacher is better. They can see the big picture and say, "Hey, we need more sloths!"
When everyone's data is totally different (some have farms, some have zoos), the Individual Student is better. They know their own backyard best and can find the unique things hiding there.

The Solution: "FairFAL" (The Smart Librarian)

The authors built a new system called FairFAL. Think of it as a super-smart Librarian who manages the study group. Here is how it works in three simple steps:

1. The "Thermostat" (Adaptive Model Selection)

The Librarian checks the room temperature.

If the room is "too hot" with common data (high imbalance) and everyone is similar, the Librarian asks the Group Teacher to pick the next photos.
If the room is "cold" and everyone is different (high diversity), the Librarian asks the Individual Students to pick.
Why? This ensures the right person is making the decision based on the current situation.

2. The "Prototype" (The Cheat Sheet)

Usually, AI gets biased toward common things. To fix this, the Librarian creates a "Cheat Sheet" (Prototypes) for every animal type.

Instead of just asking "Is this a dog?", the system looks at the "average dog" and the "average sloth" in the feature space.
It then forces the system to look for photos that match the rare cheat sheets (the sloths) just as hard as the common ones. This ensures the AI doesn't ignore the rare animals.

3. The "Two-Stage Hunt" (Uncertainty + Diversity)

Finally, the Librarian doesn't just grab the first 10 photos that look confusing.

Stage 1: They find all the photos that are confusing (Uncertainty).
Stage 2: From those confusing photos, they pick ones that are different from each other (Diversity).
Analogy: If you need to learn about "cars," you don't want 10 photos of the exact same red Ford. You want one red Ford, one blue Toyota, one old truck, and one sports car. This ensures the AI learns the whole concept, not just one tiny slice of it.

The Result

When they tested this "Fair Librarian" on five different datasets (including medical images like eye scans and skin lesions), it consistently beat all other methods.

In short:
Existing methods were like a student who only studied the most common questions and failed the exam because they didn't know the rare ones. FairFAL is like a tutor who realizes, "Wait, we have too many questions about dogs and not enough about sloths," and forces the study group to focus on the sloths, ensuring everyone passes the test, even the rare ones.

It works better because it adapts to the situation, forces fairness, and picks a diverse set of examples to learn from.

Here is a detailed technical summary of the paper "Federated Active Learning Under Extreme Non-IID and Global Class Imbalance" by Zong and Huang.

1. Problem Statement

The paper addresses the critical challenges in Federated Active Learning (FAL) when deployed in realistic, complex environments characterized by two simultaneous issues:

Extreme Non-IID (Heterogeneity): Client data distributions vary significantly across the federation (modeled by a Dirichlet parameter $\alpha$ ).
Global Class Imbalance: The aggregated dataset across all clients follows a long-tailed distribution where minority classes are rare but critical (modeled by an imbalance ratio $\rho$ ).

The Core Challenge: Existing FAL methods often assume balanced global distributions or treat heterogeneity merely as a partitioning issue. Under extreme non-IID and global imbalance, standard acquisition strategies (querying informative samples) fail to identify minority classes. This leads to systematic bias toward "head" classes, inefficient use of annotation budgets, and degraded global model performance. The authors investigate a fundamental question: Should the global aggregated model or the local client model be used as the query selector to ensure class-balanced sampling?

2. Key Observations & Insights

Through systematic empirical analysis on CIFAR-10 under varying $\alpha$ and $\rho$ , the authors established three critical insights:

Observation 1 (Uncertainty-based Sampling): The local model generally outperforms the global model for uncertainty-based querying (e.g., Entropy) in highly heterogeneous settings ( $\alpha$ is small). However, when global imbalance is severe ( $\rho$ is large) and client distributions are relatively homogeneous ( $\alpha$ is large), the global model becomes superior because it leverages shared knowledge to counteract local skew.
Observation 2 (The Balance-Performance Link): The primary driver of final performance is class-balanced sampling, particularly the acquisition of minority class instances. The model that achieves better balance (especially for rare classes) consistently yields higher test accuracy.
Observation 3 (Diversity-based Sampling): For diversity-based strategies (e.g., Coreset), the global model consistently outperforms the local model. This is because the global model, trained on aggregated updates, develops more discriminative and globally aligned feature representations, allowing for better estimation of the data manifold structure.

3. Methodology: FairFAL

Based on these insights, the authors propose FairFAL, an adaptive, class-fair FAL framework consisting of three core components:

A. Adaptive Model Selection

Instead of statically choosing a global or local model, FairFAL dynamically selects the optimal query selector for each client based on estimated statistics:

Global Imbalance Estimation: Clients estimate the severity of global long-tailedness ( $\bar{\gamma}$ ) using prediction discrepancies on a class-balanced subset of their local labeled data.
Local-Global Divergence Estimation: Clients compute the divergence ( $d_k$ ) between their local predictive prior and the global predictive prior.
Decision Rule: A continuous score $s_k$ is calculated. If the global distribution is highly imbalanced and local divergence is low, the global model is selected. Otherwise, the local model is chosen. This mechanism is lightweight and privacy-preserving.

B. Prototype-Guided Pseudo-Labeling

To enforce class-aware querying and prevent bias toward head classes:

Global Feature Extraction: Leveraging Observation 3, the global model is used as a feature extractor to compute per-class prototypes (centroids) from labeled data.
Pseudo-Labeling: Unlabeled samples are assigned pseudo-labels based on their cosine similarity to these global prototypes rather than raw classifier logits. This mitigates the bias inherent in classifiers trained on imbalanced data.
Class-Specific Pools: The unlabeled pool is partitioned into class-specific subsets based on these pseudo-labels.

C. Two-Stage Uncertainty–Diversity Balanced Sampling

To ensure selected samples are both informative and diverse within each class:

Stage 1 (Candidate Selection): For each class, the top- $\kappa$ most uncertain samples are selected to form an over-complete candidate pool.
Stage 2 (Diversity Refinement): A k-center selection algorithm is applied in a gradient-embedding space (using global features). This selects a subset of samples that maximizes coverage of the feature space while minimizing redundancy, ensuring the final query set is diverse and representative.

4. Experimental Results

FairFAL was evaluated on five benchmarks: FMNIST, CIFAR-10, CIFAR-100, OctMNIST, and DermaMNIST (the latter two being naturally imbalanced medical datasets).

Performance: FairFAL consistently outperformed state-of-the-art baselines (including KAFAL, LoGo, IFAL, and standard AL strategies like Entropy and Coreset) across all datasets and settings.
Robustness: The performance gap widened as task difficulty increased (e.g., from FMNIST to CIFAR-100) and under extreme heterogeneity ( $\alpha=0.1$ ) and imbalance ( $\rho=20$ ).
Medical Applications: On the naturally imbalanced medical datasets (OctMNIST, DermaMNIST), FairFAL achieved the highest accuracy, demonstrating its efficacy in real-world clinical scenarios where minority classes are critical.
Ablation Studies:
- Removing adaptive model selection or prototype-guided sampling significantly degraded performance.
- The method was robust to variations in the number of clients, FL frameworks (FedAvg, FedProx, SCAFFOLD), and backbone architectures (CNN, MobileNet, ResNet).
- The choice of uncertainty measure (Entropy, Margin, LC) had minimal impact, confirming the stability of the class-balancing mechanism.

5. Significance and Contributions

Theoretical Insight: The paper provides a rigorous analysis of the trade-off between global and local query models in FAL, establishing that class balance is the decisive factor for performance, not just the model source.
Novel Framework: FairFAL is the first framework to explicitly integrate adaptive model selection, prototype-guided pseudo-labeling, and gradient-embedding diversity sampling to tackle the dual challenges of extreme non-IID and global imbalance.
Practical Impact: By effectively handling long-tailed distributions in privacy-constrained environments, FairFAL offers a viable solution for critical applications like medical diagnosis and autonomous driving, where data is siloed, heterogeneous, and often imbalanced.
Open Source: The authors have released the code to facilitate further research and reproducibility.

In summary, this work demonstrates that in Federated Active Learning, fairness in sampling (specifically for minority classes) is paramount. By adaptively switching between global and local models and enforcing class balance through prototype-guided mechanisms, FairFAL significantly improves sample efficiency and model robustness in challenging real-world federated settings.