ProxyFL: A Proxy-Guided Framework for Federated Semi-Supervised Learning

Imagine a group of detectives (the clients) trying to solve a massive mystery together, but they can't share their secret notebooks (the data) because of privacy rules. They have to send only their theories and conclusions (the model updates) to a central headquarters (the server).

This is Federated Learning. But here's the twist: most detectives only have a few confirmed clues (labeled data) and a huge pile of scribbled notes with no answers (unlabeled data). This is Federated Semi-Supervised Learning (FSSL).

The problem? The detectives are all working in different neighborhoods with different types of crimes (External Heterogeneity), and within their own offices, the confirmed clues don't match the messy scribbles (Internal Heterogeneity).

The paper introduces a new method called ProxyFL to fix this. Here is how it works, using simple analogies:

1. The Old Way: "The Average Vote" (And Why It Fails)

Usually, the headquarters tries to create a "Master Detective" by simply averaging everyone's theories.

The Problem: If one detective is an outlier (maybe they only see cat burglaries while everyone else sees bank robberies), their weird theory drags the average off course.
The Result: The Master Detective becomes confused and bad at solving crimes.
The "Low-Confidence" Issue: To avoid mistakes, detectives usually throw away their messy, uncertain notes. This means they are ignoring a huge amount of potential evidence, making the team smaller and weaker.

2. The New Way: ProxyFL (The "Mental Map" Strategy)

ProxyFL introduces a clever trick: instead of just averaging theories, they create a shared "Mental Map" of categories (called Proxies). Think of these Proxies as the "ideal definition" of a Cat, a Dog, or a Bank Robbery.

Part A: Fixing the "Different Neighborhoods" (External Heterogeneity)

The Old Way: Just take the average of everyone's definition of a "Cat." If one detective thinks a "Cat" looks like a "Hamster," the average becomes a weird "Hamster-Cat."
The ProxyFL Way: The headquarters creates a Global Mental Map. Instead of blindly averaging, it actively adjusts this map to fit the reality of all the detectives, ignoring the weird outliers.

Analogy: Imagine a teacher drawing a map of "What a Cat looks like." If one student draws a cat with wings, the teacher doesn't just average the drawings. The teacher looks at all the drawings, sees the wings are an outlier, and draws a perfect, balanced cat that represents the group's true understanding. This map is sent back to everyone to align their thinking.

Part B: Fixing the "Messy Notes" (Internal Heterogeneity)

The Old Way: If a detective isn't 100% sure if a note says "Hamster" or "Mouse," they throw the note in the trash to be safe.
The ProxyFL Way: ProxyFL says, "Don't throw it away! Let's keep it, but be smart about it."

The "Indecisive Categories" Trick: Instead of forcing a guess like "It's a Mouse," the system says, "Okay, this note is confusing. It could be a Mouse OR a Hamster."
The "Positive-Negative Pool": The system creates a special training zone.
- Positive: It tells the model, "This note is close to being a Mouse or a Hamster."
- Negative: It tells the model, "This note is definitely not a Dog or a Car."
Analogy: Imagine a student studying for a test. Instead of skipping the hard questions (low-confidence samples), they write down, "This answer is likely A or B, but definitely not C or D." This helps them learn the boundaries of the answers without getting scared of making a mistake.

Why is this a Big Deal?

No Privacy Leaks: The "Mental Map" (Proxies) is just a tiny part of the model's brain, not the actual data. It's like sharing a summary of your thoughts rather than your diary.
More Data, Less Waste: By keeping the "messy notes" and treating them as "maybe this, maybe that," the team uses all the available evidence, not just the easy stuff.
Faster Learning: Because the "Master Detective" (Global Model) has a clearer map and more practice data, the whole team learns much faster and solves the mystery more accurately.

The Bottom Line

ProxyFL is like a smart team leader who:

Refines the group's shared dictionary so everyone agrees on what things look like, even if they come from different places.
Encourages the team to use their uncertain notes by treating them as "possible options" rather than discarding them.

The result? A super-smart, privacy-safe AI that learns faster and better, even when the data is messy and scattered across many different devices.

1. Problem Statement

The paper addresses Federated Semi-Supervised Learning (FSSL), a setting where clients collaboratively train a global model using limited labeled data and abundant unlabeled data while preserving privacy. The authors identify two critical challenges in FSSL:

External Heterogeneity: The distribution discrepancy of data across different clients (Non-IID). Existing methods often use fixed or dynamic aggregation weights (e.g., based on dataset size) to update the global model. However, the authors argue that simple averaging of classifier weights is prone to skewing toward outliers, failing to accurately capture the ideal global category distribution.
Internal Heterogeneity: The mismatch between labeled and unlabeled data distributions within a single client. This is often caused by class imbalance or the distribution gap between labeled and unlabeled sets. Current FSSL methods typically filter out low-confidence unlabeled samples to avoid pseudo-label errors. This approach leads to data starvation, where valuable training samples are discarded, slowing convergence and limiting performance, especially under high heterogeneity.

Core Question: How can the framework simultaneously fit the global distribution robustly against outliers (without privacy concerns) and effectively utilize low-confidence unlabeled samples without compromising model accuracy?

2. Methodology: ProxyFL

The authors propose ProxyFL, a framework that uses learnable classifier weights as "proxies" to model category distributions both locally and globally. Unlike prototypes (which are feature embeddings), proxies are the actual model parameters (the final fully connected layer weights), ensuring no extra communication cost or privacy leakage.

The framework consists of two main mechanisms:

A. Global Proxy Tuning (GPT) - Addressing External Heterogeneity

Instead of directly averaging local classifier weights (which is sensitive to outliers), the server performs an explicit optimization step:

Initialization: The server initializes global proxies ( $\Omega_G$ ) by averaging the uploaded local proxies from clients.
Optimization: The server fine-tunes these global proxies using a contrastive loss function. The objective is to pull the global proxy for a specific class closer to all local proxies of that same class and push it away from proxies of other classes.
Benefit: This explicitly optimizes the global category distribution, making it robust to outliers caused by data heterogeneity across clients, rather than relying on a simple weighted average.

B. Indecisive-Categories Proxy Learning (ICPL) - Addressing Internal Heterogeneity

To utilize low-confidence unlabeled samples without introducing severe pseudo-label noise, the authors introduce a dynamic learning mechanism:

Category Set Construction:
- Labeled/High-Confidence Samples: Assigned a single ground-truth or pseudo-label.
- Low-Confidence Samples: Instead of discarding them or assigning a single potentially wrong pseudo-label, the model assigns an Indecisive-Categories Set ( $\xi_i$ ). This set contains multiple candidate categories where the model's prediction logits exceed a dynamic threshold derived from a global category prior ( $P'_G(Y)$ ).
Positive-Negative Proxy Pool:
- Positive Proxy: For a low-confidence sample, the positive proxy is a weighted sum of the proxy weights corresponding to the categories in its indecisive set.
- Negative Proxies: Any sample whose category set does not overlap with the current sample's set is treated as a negative.
Training: A contrastive learning loss is applied to pull the sample's feature representation closer to its positive proxy and push it away from negative proxies. This allows the model to learn from low-confidence samples by acknowledging their ambiguity rather than forcing a potentially incorrect single label.

3. Key Contributions

Unified Proxy Framework: The first work to propose a unified proxy (classifier weights) to simultaneously mitigate both external and internal heterogeneity in FSSL.
Robust Global Aggregation (GPT): Replaces direct weight averaging with an explicit optimization objective on the server, significantly reducing the impact of outliers on the global category distribution.
Effective Low-Confidence Utilization (ICPL): Introduces a "Positive-Negative Proxy Pool" and "Indecisive-Categories Set" to incorporate low-confidence samples. This avoids data starvation while mitigating the risk of pseudo-label bias.
Privacy and Efficiency: Since proxies are part of the model parameters, the method introduces negligible communication overhead and no additional privacy risks compared to standard FL.

4. Experimental Results

The authors evaluated ProxyFL on four datasets (CIFAR-10, CIFAR-100, SVHN, CINIC-10) under varying levels of data heterogeneity ( $\alpha = 0.1, 0.5, 1$ ) with only 10% labeled data.

Performance: ProxyFL achieved State-of-the-Art (SOTA) results across all datasets and heterogeneity levels.
- On CIFAR-100 with high heterogeneity ( $\alpha=0.1$ ), it improved the second-best method (SAGE) by 3.32%.
- On SVHN and CINIC-10, it even achieved performance comparable to fully-supervised FedAvg (FedAvg-SL) in some high-heterogeneity scenarios.
Convergence: ProxyFL demonstrated significantly faster convergence. On CIFAR-100, it reached 50% accuracy in 177 rounds, whereas the next best method (SAGE) required 267 rounds (a 3.18x speedup).
Ablation Studies:
- Both GPT and ICPL modules contributed positively; using both yielded the best results.
- The "Indecisive-Categories" strategy outperformed methods that simply assigned single pseudo-labels to low-confidence samples or discarded them.
- Using proxies (model weights) was superior to using prototypes (feature embeddings), which can be computationally heavier and pose privacy risks.

5. Significance

ProxyFL offers a paradigm shift in handling FSSL challenges by moving away from heuristic aggregation and hard filtering of data.

Theoretical Insight: It demonstrates that optimizing the distribution of classifier weights is more robust than averaging them directly.
Practical Impact: By effectively utilizing low-confidence data, it maximizes the utility of unlabeled data in privacy-sensitive environments, making FSSL more viable for real-world applications where data is scarce and heterogeneous.
Efficiency: The method achieves these gains without increasing communication costs or compromising the privacy guarantees of Federated Learning.