FederatedFactory: Generative One-Shot Learning for Extremely Non-IID Distributed Scenarios

Imagine you are trying to teach a group of doctors how to diagnose a rare disease. The problem is that these doctors work in different hospitals, and they are not allowed to share their patients' files due to strict privacy laws.

Here is the catch: Hospital A only sees patients with Type 1 of the disease. Hospital B only sees Type 2. Hospital C only sees Type 3. None of them has seen a mix of all three.

The Old Way: The "Blindfolded Committee"

In traditional methods (called Federated Learning), the doctors try to learn together by sending their "notes" (mathematical updates) to a central server.

The Problem: Since Hospital A has never seen Type 2 or 3, their notes say, "Type 2 doesn't exist!" Hospital B says, "Type 1 is a fake!"
The Result: When the server tries to combine these conflicting notes, it gets confused. It's like a committee where everyone is shouting a different truth. The final result is a confused, broken model that can't diagnose anything. In the paper, this caused the system to fail completely, dropping accuracy from 90% down to 11%.

The New Solution: FederatedFactory

The authors of this paper, FederatedFactory, propose a brilliant twist. Instead of sending "notes on how to diagnose," they send "blueprints for a machine that can make fake patients."

Here is how it works, step-by-step:

1. The "Generative Factory" (The Local Baker)

Instead of sending their diagnosis rules, each hospital builds a small, private "Factory" (a type of AI called a Diffusion Model).

Hospital A trains its factory only on Type 1 patients. It learns exactly what Type 1 looks like.
Hospital B trains its factory only on Type 2.
Hospital C trains its factory only on Type 3.

Crucially, no real patient data ever leaves the hospital. They only send the "blueprint" (the mathematical weights) of their factory.

2. The "Ex Nihilo" Synthesis (Creating from Nothing)

Once the central server (or the network of hospitals) has all the blueprints, it does something magical: It creates a brand new, perfect dataset from thin air.

The server takes the blueprint from Hospital A and generates 1,000 fake Type 1 patients.
It takes the blueprint from Hospital B and generates 1,000 fake Type 2 patients.
It does the same for Hospital C.

Now, the server has a perfectly balanced dataset with 1,000 examples of every type of disease, even though no single hospital ever had all of them.

3. The Final Teacher

The server uses this newly created, balanced dataset to train the final "Master Doctor" (the global AI model). Because the Master Doctor has seen examples of all types (even though they were synthetic), it learns the correct boundaries between diseases.

Why is this a Big Deal?

1. No More "Blind Spots"
In the old way, the AI was blind to diseases it hadn't seen. In this new way, the AI gets to see everything because the factories can generate infinite examples of the missing types.

Analogy: It's like trying to learn to cook a full banquet. In the old way, you only had a chef who knew how to make soup, another who only made steak, and a third who only made salad. They argued about the menu. In the new way, you ask each chef to write down their secret recipe, then you hire a new chef who uses those recipes to cook the entire banquet perfectly.

2. One-Shot Efficiency
Usually, these systems require hundreds of rounds of back-and-forth communication (like a long email chain). FederatedFactory does it in one single round.

Analogy: Instead of a long, tedious negotiation, everyone sends their recipe in one envelope, and the party starts immediately. This saves massive amounts of time and internet bandwidth.

3. The "Right to be Forgotten" (Modular Unlearning)
What if Hospital A wants to leave the group and have its data erased?

In old systems, you'd have to retrain the whole AI from scratch.
In FederatedFactory, you just delete Hospital A's blueprint from the server. Since the Master Doctor was trained on the synthetic data generated by that blueprint, deleting the blueprint instantly erases the influence of Hospital A's data. It's like removing a specific ingredient from a recipe book; the cake changes instantly without needing to bake a new one.

The Trade-off

The paper admits there is a cost. Instead of saving "internet bandwidth" (sending small notes), the hospitals have to do more "computing work" (training their local factories).

Analogy: It's like asking everyone to build a small 3D printer in their garage (high local effort) so they can print the parts they need, rather than shipping heavy boxes of raw materials across the country (high shipping cost). For hospitals with powerful computers, this is a fair trade.

The Results

The paper tested this on medical images (like skin cancer and blood cells) and standard image datasets.

Old Method: Accuracy crashed to 11% (basically guessing).
FederatedFactory: Accuracy soared to 90%, matching the performance of a system that had access to all the real data combined.

Summary

FederatedFactory solves the problem of "isolated data" by turning the problem inside out. Instead of trying to merge the answers (which conflict), they merge the ability to create examples. By sharing the "blueprints" to generate data rather than the data itself, they create a perfect, balanced training set that allows AI to learn effectively without ever violating patient privacy.

1. Problem Statement

The paper addresses a critical failure mode in Federated Learning (FL) known as the Single-Class Silo Regime.

The Challenge: In standard FL, clients optimize shared discriminative parameters (weights). This relies on the assumption that local data distributions are Independent and Identically Distributed (IID) or at least have overlapping label supports.
The Pathological Case: In real-world scenarios like multi-institutional medical imaging, data is often extremely Non-IID. Specifically, in a "Single-Class Silo," each client $k$ possesses data for only one unique class ( $Y_k \cap Y_j = \emptyset$ for $i \neq j$ ).
Consequence: Under these conditions, standard parameter aggregation (e.g., FedAvg) fails catastrophically. Local clients optimize gradients for mutually exclusive classes, leading to gradient conflict where trajectories diverge rather than converge. The global model collapses, often predicting only the majority class or random noise (e.g., accuracy dropping from ~90% to ~11% on CIFAR-10).
Limitations of Current Solutions: Existing One-Shot FL (OSFL) methods often rely on pretrained Foundation Models (FMs) (e.g., Stable Diffusion, CLIP) to synthesize missing data. However, these introduce external prior bias and fail in specialized domains (like medical imaging) where the FM's manifold does not cover rare, domain-specific features (out-of-distribution errors).

2. Methodology: FederatedFactory

The authors propose FederatedFactory, a zero-dependency, One-Shot framework that inverts the unit of federation. Instead of exchanging discriminative weights, clients exchange generative priors.

Core Concept

Unit Inversion: The federation unit shifts from discriminative parameters $W$ to localized generative parameters $\theta_k$ .
Zero-Dependency: The framework does not use external Foundation Models. It relies exclusively on models trained on the true, localized data distributions.
Ex Nihilo Synthesis: The server (or peers) synthesizes a globally class-balanced dataset from scratch using the aggregated generative modules.

Operational Protocols

The framework supports two modes:

Centralized Synthesis (Trusted Aggregator):
- Clients train a local generative model (Factory, e.g., EDM2 Diffusion) on their private single-class data.
- Clients upload only the generative parameters $\theta_k$ to the server (1 communication round).
- The server concatenates these Factories into a universal prior library $\Theta$ .
- The server samples from a standard latent space $Z \sim \mathcal{N}(0, I)$ , projecting it through each Factory to generate a synthetic, class-balanced dataset $\hat{D}_{syn}$ .
- A global classifier is trained exclusively on $\hat{D}_{syn}$ .
Decentralized Synthesis (Peer-to-Peer):
- Clients broadcast their generative priors to all peers.
- Each client locally synthesizes the missing classes using the received priors, creating a hybrid dataset (Real Local + Synthetic Others).
- Clients train local expert classifiers.
- Inference uses a Product of Experts (PoE) formulation, aggregating predictions via a renormalized product of probabilities to enforce strict consensus.

Theoretical Guarantee

The paper provides a convergence proof (Theorem 1) showing that the excess global risk is strictly bounded by the local generative error ( $\bar{\epsilon}$ ).

Unlike FM-dependent methods where the error bound is dominated by a potentially infinite projection error ( $\lambda$ ) between the FM and local data, FederatedFactory's error approaches zero as local diffusion models converge.
This eliminates the need for overlapping label supports to form decision boundaries.

Modular Unlearning

The framework structures the global model as a Generative Matrix $\Gamma$ . This allows for exact modular unlearning:

Client Removal: Deleting a column in $\Gamma$ (removing a client's factory).
Concept Erasure: Deleting a row (removing a class globally).
Targeted Erasure: Deleting a specific cell (removing a class from a specific client).
This enables compliance with "Right to be Forgotten" regulations without retraining the entire ensemble.

3. Key Contributions

Robustness to Extreme Heterogeneity: The method recovers centralized upper-bound performance in pathological single-class silos where standard FL methods collapse.
Zero-Dependency Federation: By decoupling synthesis from external FMs, it avoids projection errors and external bias, relying solely on localized priors.
Communication Efficiency: It operates in a One-Shot manner ( $C_{rounds}=1$ ), reducing communication overhead by ~99.4% compared to iterative FL.
Exact Modular Unlearning: It provides a structural mechanism for exact data erasure, a feature rarely available in standard FL.

4. Experimental Results

The framework was evaluated on CIFAR-10, MedMNIST (Blood, Retina, Path), and ISIC2019 (Dermoscopic lesions) under extreme label skew ( $\alpha \to 0$ ).

Performance Recovery:
- CIFAR-10: Accuracy improved from a collapsed 11.36% (FedAvg) to 90.57%, matching the centralized upper bound.
- ISIC2019: AUROC improved from 47.31% to 90.57%, recovering the centralized performance.
- MedMNIST: Consistently matched or exceeded centralized baselines across Blood, Retina, and Path subsets.
Communication vs. Computation:
- Communication: Reduced from hundreds of gigabytes (iterative FL) to ~1.9 GB (One-Shot).
- Computation: Increased local FLOPs by an order of magnitude (the "generative tax") to train diffusion models, but this is acceptable in cross-silo settings (e.g., hospitals) where bandwidth is a bottleneck and compute is abundant.
Qualitative Analysis: t-SNE projections and nearest-neighbor analysis confirmed that the synthetic data preserves the true morphological manifolds and textures of the real data without memorization.

5. Significance and Impact

Paradigm Shift: The paper challenges the standard FL dogma of aggregating discriminative weights. It demonstrates that for disjoint data, transferring the data manifold approximation (generative priors) is fundamentally more robust than trying to average conflicting gradients.
Medical AI Applicability: It solves a major bottleneck in multi-institutional medical research, where hospitals often hold data for only specific rare diseases, preventing the creation of generalizable models via standard FL.
Privacy and Regulation: The ability to perform exact unlearning by simply deleting generative modules offers a practical solution for GDPR compliance in distributed systems, a significant hurdle for current FL approaches.
Scalability: By shifting the burden from communication to local computation, it aligns with the infrastructure of modern data silos (hospitals, financial institutions) that have high compute power but strict data sovereignty constraints.

In summary, FederatedFactory offers a mathematically grounded, zero-dependency solution that enables high-performance collaborative learning in scenarios previously considered impossible for Federated Learning due to extreme data heterogeneity.