FedGIN: Federated Learning with Dynamic Global Intensity Non-linear Augmentation for Organ Segmentation using Multi-modal Images

Imagine you are trying to teach a robot to recognize different organs in the human body, like the liver, kidneys, or pancreas. To do this well, the robot needs to see thousands of examples. But here's the catch: hospitals have a strict rule—they cannot share patient photos (medical scans) with each other because of privacy laws. It's like trying to solve a giant puzzle, but every piece is locked in a different vault.

This is where FedGIN comes in. It's a clever new method that lets hospitals work together to train a super-smart AI without ever actually sharing the patient photos.

Here is how it works, broken down into simple concepts:

1. The Problem: Two Different Languages

Hospitals use two main types of "cameras" to take pictures of the inside of the body:

MRI: Like a high-definition, soft-light photo.
CT Scan: Like a sharp, X-ray style photo.

The problem is that these two cameras see the world very differently. An MRI might make the liver look gray and soft, while a CT scan makes it look bright and sharp. If you train a robot on just MRI photos, it gets confused when it sees a CT scan, and vice versa. Usually, you'd need to gather all the photos in one giant room to teach the robot both languages, but privacy laws forbid that.

2. The Solution: The "Federated" Classroom

Instead of bringing all the photos to one room, Federated Learning is like a classroom where the teacher (the central AI) sends a lesson plan to every student (the hospitals).

Each student studies their own private photos in their own classroom.
They learn what they can and send back only their notes (the math updates), not the photos.
The teacher combines all the notes to create a smarter global lesson plan and sends it back out.

This keeps the photos safe, but there's a new problem: because the students are using different "cameras" (MRI vs. CT), their notes are written in different "dialects." The teacher gets confused trying to combine them.

3. The Secret Sauce: The "Translator" (GIN)

This is where FedGIN shines. The authors added a special tool called GIN (Global Intensity Non-linear augmentation).

Think of GIN as a universal translator or a magic filter that the students use while they study.

When a student looks at a CT scan, the GIN filter gently "warps" the colors and brightness to make it look a little bit like an MRI.
When a student looks at an MRI, the filter tweaks it to look a bit like a CT scan.

By doing this during the learning process, the robot learns that "a liver is a liver," regardless of whether it's seen through a CT camera or an MRI camera. It learns the shape and structure rather than getting hung up on the specific colors or lighting.

4. The Results: A Team That Works Better Than Individuals

The researchers tested this on five different organs: the liver, kidneys, spleen, gallbladder, and pancreas.

The "Hard" Organs: For tricky organs like the pancreas and gallbladder (which are small, hard to see, and look different in every person), the FedGIN team was a game-changer. By combining the strengths of both MRI and CT data without sharing the data, they improved the accuracy by 12% to 18% compared to trying to learn from just one type of scan.
The "Easy" Organs: For organs like the liver, which are big and easy to spot, the improvement was smaller because the AI was already pretty good at finding them.

The Big Picture

Imagine you are trying to learn to drive.

Old Way: You only practice in the rain (MRI). When you finally get on a sunny day (CT), you crash because the conditions are too different.
FedGIN Way: You practice in the rain, but your instructor uses a special simulation (GIN) to show you what the road would look like in the sun. You learn to drive in any weather.

In summary: FedGIN is a privacy-safe way for hospitals to pool their knowledge. It uses a smart "translation" trick to teach AI how to recognize organs no matter what kind of camera took the picture. This means better, more reliable medical AI for everyone, without breaking any privacy rules.

1. Problem Statement

The paper addresses three critical challenges in medical image segmentation:

Data Scarcity and Privacy: High-quality, diverse medical datasets are often siloed across institutions due to strict privacy regulations (e.g., GDPR, HIPAA), preventing the creation of large, centralized training sets.
Domain Shift and Modality Differences: Medical imaging modalities like Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) have vastly different intensity distributions and contrast characteristics. Models trained on one modality often fail to generalize to another.
Unpaired Multimodal Data: In real-world clinical settings, patients rarely have both CT and MRI scans for the same anatomical region. Existing multimodal methods often rely on paired data or complex architectures that are difficult to scale in resource-constrained environments.

The core problem is how to train a unified, robust segmentation model across multiple institutions and modalities (CT and MRI) without sharing raw patient data, specifically handling the distribution mismatch between unpaired modalities.

2. Methodology: FedGIN Framework

The authors propose FedGIN, a Federated Learning (FL) framework that integrates a Global Intensity Non-linear (GIN) augmentation module.

A. Federated Learning Workflow

Setup: A central server coordinates training among multiple clients (hospitals). Each client holds private data, which may be unimodal (CT only, MRI only) or multimodal.
Process:
1. The server broadcasts a global model to all clients.
2. Clients perform local training on their private data.
3. Clients send updated model weights back to the server.
4. The server aggregates updates using FedAvg (Federated Averaging) to refine the global model.
5. This cycle repeats for multiple communication rounds.

B. GIN Augmentation Module

To bridge the domain gap between CT and MRI without paired data, FedGIN employs a lightweight GIN augmentation module applied on-the-fly during local training:

Mechanism: GIN uses a shallow convolutional network with weights sampled from a Gaussian distribution ( $N(0, I)$ ).
Transformation: It applies non-linear intensity transformations (using Leaky ReLU) to the input image.
Blending: The transformed output is blended with the original input using a random coefficient $\alpha \sim U(0, 1)$ :
$g_\theta(x) = \alpha \cdot g^{Net}_\theta(x) + (1 - \alpha) \cdot x$
Normalization: The result is normalized using the Frobenius norm to preserve structural content while altering intensity and texture.
Goal: This forces the model to learn modality-invariant features, making it robust to intensity distribution shifts between CT and MRI.

C. Model Architecture

Base Model: A 2D U-Net architecture.
Details: Uses strided convolutions for downsampling and bilinear interpolation for upsampling.
Loss Function: A combination of Focal Loss and Dice Loss (weighted equally).
Optimization: AdamW optimizer with a learning rate scheduler.

3. Experimental Setup

Datasets:
- Training/Validation: TotalSegmentator dataset (CT and MRI).
- Testing: AMOS2020 dataset (unpaired CT and MRI).
Targets: Five abdominal organs: Liver, Kidneys, Spleen, Pancreas, and Gallbladder.
Scenarios:
1. Limited Data: Training started with MRI only, progressively adding CT cases to test generalization.
2. Complete Data: Full utilization of available CT and MRI data across clients.
Baselines: Local (MRI-only/CT-only), Centralized (Combined data), FL (without GIN), and FL with GIN (FedGIN).

4. Key Results

The evaluation focused on 3D Dice Similarity Coefficients (DSC).

Performance on Low-Contrast Organs:
- FedGIN showed the most significant improvements for structurally complex, low-contrast organs (Spleen, Pancreas, Gallbladder).
- In the Limited Data scenario, FedGIN achieved a 12–18% improvement in Dice scores on MRI test cases compared to FL without GIN.
- For the Gallbladder, FedGIN improved scores from ~0.08 (MRI-only) and ~0.51 (CT-only) to >0.60 in the multimodal setting.
Comparison with Centralized Training:
- In the Complete Data scenario, FedGIN achieved near-centralized performance.
- It demonstrated a 30% improvement over the MRI-only baseline and a 10% improvement over the CT-only baseline.
- For the Pancreas, FedGIN achieved a Dice score of 0.69, nearly matching the centralized GIN model (0.68).
Failure Cases:
- For the Liver, where MRI baselines were already high and anatomy is homogeneous, the addition of CT data and GIN provided marginal or no benefit. This suggests GIN is most effective where domain shifts are significant and baseline performance is suboptimal.
Stability:
- FL without GIN consistently underperformed and degraded as more CT data was added, confirming that unharmonized modality mixing harms performance. GIN stabilized the cross-domain learning.

5. Key Contributions

Novel FL Framework: Introduction of FedGIN, the first framework to integrate dynamic GIN augmentation specifically for unpaired multimodal federated learning.
Privacy-Preserving Generalization: Demonstrated that high-performance cross-modality segmentation is achievable without sharing raw data, addressing the "data silo" problem.
Lightweight Solution: Unlike previous multimodal methods requiring complex architectures or paired data, FedGIN uses a lightweight augmentation strategy within a standard U-Net, making it scalable for clinical deployment.
Empirical Validation: Provided extensive evidence that incorporating complementary modalities (CT) via FL significantly boosts segmentation for difficult organs (Pancreas, Spleen) where single-modality models struggle.

6. Significance

This work bridges the gap between theoretical domain generalization and practical clinical deployment. By proving that Federated Learning combined with intensity augmentation can match centralized performance, the authors offer a viable path for hospitals to collaborate on AI model training while adhering to strict privacy laws. The method is particularly valuable for rare diseases or complex anatomical regions where data scarcity is a major bottleneck, enabling the creation of robust, modality-agnostic diagnostic tools.