FedSKD: Aggregation-free Model-heterogeneous Federated Learning via Multi-dimensional Similarity Knowledge Distillation for Medical Image Classification

Imagine a group of doctors from different hospitals around the world who all want to build a super-smart AI to diagnose diseases. However, there's a big problem: privacy laws prevent them from sharing their patients' actual medical records (like MRI scans or skin photos) with each other. They need to collaborate without ever seeing each other's private data.

This is where Federated Learning comes in. It's like a study group where everyone keeps their own textbooks but shares their notes to learn from one another.

The Problem: The "One-Size-Fits-All" Trap

Most current methods try to force every hospital to use the exact same "brain" (AI model) for their notes. But in reality, hospitals are different:

Big hospitals have powerful computers and can run complex, heavy models.
Small clinics have older computers and need simple, lightweight models.

Forcing a small clinic to run a giant model is like asking a child to carry a backpack meant for an adult—it's too heavy and slows them down. Conversely, asking a big hospital to use a tiny model is like giving a race car a bicycle engine; it's underpowered.

Furthermore, existing methods often rely on a central "teacher" (a central server) to collect all the notes and average them out. This creates a bottleneck and can ruin the unique insights of specific hospitals.

The Solution: FedSKD (The "Round-Robin Study Group")

The authors propose a new method called FedSKD. Instead of a central teacher, the hospitals form a peer-to-peer chain.

Think of it like a round-robin relay race:

Hospital A trains its AI on its local data.
It passes its "brain" to Hospital B.
Hospital B doesn't just copy Hospital A; it compares its own brain with Hospital A's brain, learns from it, improves its own, and then passes the improved brain to Hospital C.
This continues until the brain has visited everyone and comes back to the start.

The Magic Trick: In this system, Hospital A can have a giant, complex brain, and Hospital B can have a tiny, simple brain. They don't need to be the same size to learn from each other!

How Do Different Brains Learn? (The "Multi-Dimensional Similarity")

You might ask: "How can a tiny brain learn from a giant brain if they look so different?"

Usually, they would try to compare their final answers (e.g., "Is this cancer? Yes/No"). But that's like comparing a student's final essay grade without looking at the writing process. It's too shallow.

FedSKD uses a technique called Multi-Dimensional Similarity Knowledge Distillation. Instead of just comparing the final grade, they compare the thinking process at three different levels:

Batch-Level (The "Group Vibe"):
- Analogy: Imagine a classroom. The AI looks at how the students in a group react to a specific question. Do they all get confused by the same thing? FedSKD ensures the visiting AI and the local AI "vibe" the same way about a batch of patients, even if their internal wiring is different.
Pixel/Voxel-Level (The "Fine Details"):
- Analogy: This is like looking at the brushstrokes on a painting. The AI compares the tiny details (pixels in a photo, voxels in a 3D scan). It ensures that if Hospital A's AI sees a "fuzzy spot" on a skin lesion, Hospital B's AI also recognizes that specific fuzzy spot, even if their models are built differently.
Region-Level (The "Big Picture"):
- Analogy: This is like looking at the whole map. In brain scans, different parts of the brain talk to each other. FedSKD checks if the AI understands how the "front of the brain" relates to the "back of the brain." It ensures the relationships between different body parts are understood correctly by both models.

Why Is This Better?

No "Teacher" Needed: It removes the central server, making it faster and more private.
No "Forced Uniformity": Every hospital can use the AI model that fits their computer best.
Prevents "Amnesia": In old methods, as the model traveled from hospital to hospital, it would forget what it learned from the first few hospitals (called "Model Drift"). FedSKD uses the techniques above to "lock in" the knowledge so the model remembers everything it learned along the way.

The Results

The researchers tested this on two real-world medical tasks:

Autism Diagnosis: Using brain scans (fMRI) from 17 different countries.
Skin Cancer Detection: Using photos of skin lesions.

The Outcome: FedSKD outperformed all other methods. It was better at diagnosing patients locally (personalization) and better at handling data from new, unseen hospitals (generalization). It proved that you don't need a central boss or identical computers to build a world-class medical AI together.

In short: FedSKD is like a traveling library where books of different sizes and shapes can be read, understood, and improved by everyone, without ever needing to be glued together into one giant book.

1. Problem Statement

Federated Learning (FL) allows collaborative model training without sharing raw data, addressing privacy concerns in medical imaging. However, existing FL approaches face two critical limitations in real-world medical scenarios:

Model Heterogeneity: Clients often have different computational resources and specific architectural needs, requiring heterogeneous model structures. Existing Model-Heterogeneous FL (MHFL) methods typically rely on a central server for aggregation, which creates scalability bottlenecks and requires partial architectural alignment.
Peer-to-Peer (P2P) Limitations: While P2P FL removes the central server, current P2P frameworks usually assume a single homogeneous model circulating sequentially. This leads to Model Drift (the model oscillates as it adapts to shifting data distributions) and Knowledge Dilution (knowledge from earlier clients is overwritten by later ones, causing catastrophic forgetting).

The Core Challenge: How to enable direct, server-free knowledge exchange among clients with fully heterogeneous model architectures while preventing model drift and knowledge dilution.

2. Methodology: FedSKD

The authors propose FedSKD, a novel P2P MHFL framework that eliminates centralized aggregation and supports fully heterogeneous models through two key innovations:

A. Round-Robin Model Circulation (Aggregation-Free)

Instead of a central server aggregating weights, FedSKD circulates models in a closed loop:

Domain-Adaptive Model (DAM): Each client maintains a local model specialized for its own data distribution.
Knowledge-Transit Model (KTM): A model received from a previous client in the round-robin sequence.
Process: In each round, a client receives a KTM from a randomly assigned peer. It performs bidirectional knowledge distillation between its local DAM and the received KTM. The updated DAM is then passed to the next client. This ensures continuous knowledge reinforcement without a central bottleneck.

B. Multi-dimensional Similarity Knowledge Distillation (SKD)

To handle heterogeneous architectures (where layer dimensions and channel counts differ), FedSKD introduces a feature-space regularization mechanism that operates purely on feature representations, not weights. It aligns the DAM and KTM across three dimensions:

Batch-wise SKD (B-SKD): Aligns inter-sample semantic relationships within a mini-batch. It computes a similarity matrix of feature activations to ensure both models capture the same dataset-specific semantic patterns.
Pixel/Voxel-wise SKD (P-SKD): Aligns fine-grained spatial feature distributions (pixel/voxel level) to ensure consistency in understanding local structural patterns.
Region-wise SKD (R-SKD): Aligns representations within predefined anatomical regions (e.g., brain regions in fMRI). This captures high-level functional interdependencies crucial for medical diagnosis.

Training Objective: The total loss combines standard Cross-Entropy loss with the multi-dimensional SKD loss. Crucially, during mutual learning, the prediction header of the received KTM is frozen to prevent over-adaptation to the local domain, preserving the cross-client knowledge embedded in the feature extractor.

3. Key Contributions

First P2P-MHFL Framework for Medical Imaging: FedSKD is the first peer-to-peer framework tailored for medical image classification that allows clients to use fully heterogeneous architectures without a central server.
Novel Distillation Mechanism: It introduces a multi-dimensional similarity knowledge distillation approach (Batch, Pixel/Voxel, Region) that effectively mitigates model drift and knowledge dilution in sequential P2P settings.
New Benchmarks: The authors constructed FedASD (a geographically stratified, non-IID fMRI dataset for Autism Spectrum Disorder diagnosis) and FedSkin (a skin lesion classification dataset with controlled non-IID distributions), serving as new resources for the community.

4. Experimental Results

The method was evaluated on two tasks: fMRI-based ASD diagnosis (using the ABIDE dataset) and Skin Lesion Classification (using the Derm7pt dataset).

Performance: FedSKD outperformed state-of-the-art baselines in both Local Test (personalization) and Global Test (generalization) scenarios.
- On FedASD (Heterogeneous), FedSKD achieved a mean AUC of 76.66% (Local) and 67.10% (Global), surpassing the best P2P baseline (FedCross†) by ~5.4% and ~3.4% respectively.
- On FedSkin, it achieved similar significant gains over heterogeneous baselines like pFedAFM and AlFeCo.
Robustness:
- Data Poisoning: In label-flipping attack scenarios, FedSKD remained significantly more resilient than other MHFL methods, maintaining high performance even when a malicious client corrupted labels.
- Fairness: FedSKD reduced the performance gap between male and female subgroups (fairness metric $\Delta_{auc}$ ) more effectively than FedCross.
Ablation Studies:
- Full multi-dimensional SKD (Batch + Pixel + Region) yielded the best results, proving the complementary nature of different granularities.
- Early activation of SKD (at the start of training) was critical for preventing irreversible model drift.

5. Significance

FedSKD represents a paradigm shift in medical federated learning by solving the "aggregation vs. heterogeneity" trade-off.

Scalability: By removing the central server, it avoids the computational and communication bottlenecks of traditional MHFL.
Privacy & Flexibility: It allows hospitals with different hardware constraints to use custom model architectures while still benefiting from collective intelligence.
Clinical Applicability: The framework addresses the specific challenges of medical data (non-IID distributions, strict privacy, and the need for robust generalization across diverse institutions), making it a viable solution for real-world, multi-institutional medical AI deployment.

In summary, FedSKD demonstrates that aggregation-free, peer-to-peer collaboration combined with multi-dimensional feature alignment can achieve superior personalization and generalization in complex, heterogeneous medical environments.

FedSKD: Aggregation-free Model-heterogeneous Federated Learning via Multi-dimensional Similarity Knowledge Distillation for Medical Image Classification

The Problem: The "One-Size-Fits-All" Trap

The Solution: FedSKD (The "Round-Robin Study Group")

How Do Different Brains Learn? (The "Multi-Dimensional Similarity")

Why Is This Better?

The Results

1. Problem Statement

2. Methodology: FedSKD

A. Round-Robin Model Circulation (Aggregation-Free)

B. Multi-dimensional Similarity Knowledge Distillation (SKD)

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank