Federated Active Learning Under Extreme Non-IID and Global Class Imbalance

This paper introduces FairFAL, an adaptive federated active learning framework that leverages lightweight prediction discrepancy and prototype-guided pseudo-labeling to dynamically select between global and local query models, effectively addressing the challenges of extreme non-IID data and global class imbalance to achieve superior performance over state-of-the-art methods.

Chen-Chen Zong, Sheng-Jun Huang

Published 2026-03-12
📖 5 min read🧠 Deep dive

Imagine you are trying to teach a group of 100 different students (the clients) to recognize animals, but you can't put them all in one classroom. Instead, they are all in their own homes, and you can't see their homework or their personal photos because of strict privacy rules. This is Federated Learning.

Now, imagine you have a very limited budget for hiring a teacher to grade their work. You can't grade every single photo they have. You need to be smart about which photos you ask them to label. This is Active Learning.

When you combine these two ideas, you get Federated Active Learning (FAL). The goal is to pick the most helpful photos to label so the group learns fast, without ever sharing the actual photos.

The Problem: The "Unfair" Classroom

The paper points out a huge problem with how this usually works in the real world:

  1. The "Long-Tail" Problem (Global Imbalance): Imagine that across all 100 students, there are 1,000 photos of dogs, but only 5 photos of rare animals like a "Platypus" or a "Sloth." Most students have mostly dogs. The rare animals are scattered sparsely. If you just ask for the "hardest" photos to label, you'll end up labeling 900 more dogs and maybe one sloth. The AI will become an expert at dogs but will never learn what a sloth looks like.
  2. The "Different Worlds" Problem (Non-IID): Student A might only have photos of farm animals. Student B might only have zoo animals. Their data is totally different (heterogeneous).

Existing methods often get confused here. They either ask the "Group Teacher" (Global Model) or the "Individual Student" (Local Model) to pick the photos.

  • If the Group Teacher picks, they might miss the rare animals because they are too focused on the common ones.
  • If the Individual Student picks, they might only pick photos from their own small, weird collection, which doesn't help the whole group.

The Big Discovery

The authors ran a massive experiment and found a simple rule: The model that picks a more balanced mix of photos (including the rare ones) always wins.

They also found a "Goldilocks" rule for who should do the picking:

  • When the whole group is super unbalanced (lots of dogs, few sloths) but everyone's data is similar, the Group Teacher is better. They can see the big picture and say, "Hey, we need more sloths!"
  • When everyone's data is totally different (some have farms, some have zoos), the Individual Student is better. They know their own backyard best and can find the unique things hiding there.

The Solution: "FairFAL" (The Smart Librarian)

The authors built a new system called FairFAL. Think of it as a super-smart Librarian who manages the study group. Here is how it works in three simple steps:

1. The "Thermostat" (Adaptive Model Selection)

The Librarian checks the room temperature.

  • If the room is "too hot" with common data (high imbalance) and everyone is similar, the Librarian asks the Group Teacher to pick the next photos.
  • If the room is "cold" and everyone is different (high diversity), the Librarian asks the Individual Students to pick.
  • Why? This ensures the right person is making the decision based on the current situation.

2. The "Prototype" (The Cheat Sheet)

Usually, AI gets biased toward common things. To fix this, the Librarian creates a "Cheat Sheet" (Prototypes) for every animal type.

  • Instead of just asking "Is this a dog?", the system looks at the "average dog" and the "average sloth" in the feature space.
  • It then forces the system to look for photos that match the rare cheat sheets (the sloths) just as hard as the common ones. This ensures the AI doesn't ignore the rare animals.

3. The "Two-Stage Hunt" (Uncertainty + Diversity)

Finally, the Librarian doesn't just grab the first 10 photos that look confusing.

  • Stage 1: They find all the photos that are confusing (Uncertainty).
  • Stage 2: From those confusing photos, they pick ones that are different from each other (Diversity).
  • Analogy: If you need to learn about "cars," you don't want 10 photos of the exact same red Ford. You want one red Ford, one blue Toyota, one old truck, and one sports car. This ensures the AI learns the whole concept, not just one tiny slice of it.

The Result

When they tested this "Fair Librarian" on five different datasets (including medical images like eye scans and skin lesions), it consistently beat all other methods.

In short:
Existing methods were like a student who only studied the most common questions and failed the exam because they didn't know the rare ones. FairFAL is like a tutor who realizes, "Wait, we have too many questions about dogs and not enough about sloths," and forces the study group to focus on the sloths, ensuring everyone passes the test, even the rare ones.

It works better because it adapts to the situation, forces fairness, and picks a diverse set of examples to learn from.