Adversarial Robustness of Capsule Networks for Medical Image Classification

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are hiring a team of experts to diagnose diseases from medical scans like X-rays and blood tests. You have three types of experts:

The Traditionalists (CNNs): These are the current industry standard. They are like brilliant detectives who have memorized millions of patterns. They are great at their job, but they have a fatal flaw: they are easily tricked by optical illusions.
The Modern Visionaries (ViTs): These are the new, high-tech experts using advanced AI. They are powerful but, surprisingly, they also fall for the same optical illusions as the Traditionalists.
The Architects (Capsule Networks): These are the new kids on the block. Instead of just memorizing patterns, they understand how things fit together in 3D space. They are like a master builder who knows that a roof belongs on top of walls, not on the side.

The Problem: The "Invisible Ink" Attack
In the world of AI, there is a scary concept called an adversarial attack. Imagine someone takes a photo of a healthy lung and adds a tiny, invisible speck of "digital noise" to it. To the human eye, the photo looks exactly the same. But to the Traditionalist and Visionary experts, that tiny speck is a magic spell that makes them scream, "This is cancer!" when it's actually healthy.

This is dangerous in a hospital. If a doctor relies on an AI that can be tricked by invisible ink, patients could get misdiagnosed.

The Experiment: The Stress Test
The authors of this paper decided to put all three types of experts through a "stress test." They took four different medical datasets (pneumonia X-rays, breast ultrasounds, lung nodules, and blood cells) and tried to trick the AI models with these invisible attacks. They used two methods:

The "Whisper" (FGSM): A quick, single-shot trick.
The "Sledgehammer" (PGD): A stronger, repeated attack that tries every possible angle to break the model.

The Results: The Architects Win
Here is what happened:

The Traditionalists and Visionaries crumbled. As soon as the "invisible ink" was applied, their confidence dropped. They started making wild guesses. It was like a detective who, upon seeing a tiny smudge on a fingerprint, immediately forgot how to read fingerprints entirely.
The Architects (Capsule Networks) stood firm. Even when the AI was hit with the strongest "sledgehammer" attacks, they kept their cool. They still correctly identified the pneumonia, the tumors, and the blood cells.

Why? The "GPS" vs. The "Map"
The paper explains why the Architects were so tough using some cool visual tests:

The "Focus" Test (Grad-CAM): Imagine the AI has a flashlight that shows what part of the image it is looking at.
- When the Traditionalists were attacked, their flashlight went crazy. It stopped looking at the tumor and started shining on the edge of the X-ray or a random shadow. They lost their focus.
- The Architects kept their flashlight steady on the actual disease, even when the image was being attacked. They knew exactly where to look.
The "Memory" Test (Latent Space): Imagine the AI organizes its knowledge in a giant library.
- When attacked, the Traditionalists' library got messy. The "cancer" books got mixed up with the "healthy" books.
- The Architects kept their library perfectly organized. The "cancer" books stayed in the cancer section, and the "healthy" books stayed in the healthy section, no matter how hard they were shaken.

The Secret Weapon: Bayes-Pearson Routing
The paper also found that one specific type of Architect (called BP-CapsNet) was the strongest of all. Think of this as the Architect having a special "noise-canceling headset." When the attack tried to confuse the AI with bad data, this headset filtered out the noise and let the AI focus only on the clear, important signals.

The Bottom Line
This study tells us that if we want AI to be safe for hospitals, we can't just rely on the current popular models (CNNs and ViTs) because they are too easily fooled by invisible tricks.

Capsule Networks are like the sturdy, reliable experts who understand the structure of the world, not just the surface patterns. They are much harder to trick, making them a much safer bet for saving lives in the future.

In short: If you want an AI doctor that won't be fooled by a magic trick, hire the Architect, not the Traditionalist.

1. Problem Statement

Deep learning (DL) models, particularly Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), have transformed medical diagnostics. However, their widespread clinical adoption is hindered by adversarial vulnerability. Adversarial examples are inputs modified by imperceptible perturbations designed to cause model misclassification.

The Gap: While adversarial training has been explored to mitigate this, it often trades off standard accuracy for robustness. The potential of alternative architectures, specifically Capsule Networks (CapsNets), to offer intrinsic robustness in medical imaging remains underexplored.
The Hypothesis: CapsNets, which model spatial hierarchies and pose information via vector outputs and routing mechanisms, may inherently resist adversarial perturbations better than CNNs and ViTs, which rely on scalar activations and attention mechanisms that can be easily disrupted.

2. Methodology

The study conducted a comprehensive comparative analysis across five models and five datasets.

A. Models Evaluated

Capsule Networks (CapsNets):
- DR-CapsNet: Standard Dynamic Routing CapsNet (original architecture).
- BP-CapsNet: Bayes-Pearson CapsNet, utilizing a routing algorithm based on Pearson correlation coefficients and self-exclusion mechanisms to handle noise.
Baseline Architectures:
- CNNs: ResNet-18 and ResNet-50.
- ViT: MedViT (a hybrid medical vision transformer).

B. Datasets
The models were trained and tested on four medical imaging datasets (MedMNIST) and one natural image control:

PneumoniaMNIST: Chest X-rays (Binary: Pneumonia vs. Normal).
BreastMNIST: Breast ultrasound (Binary: Malignant vs. Benign/Normal).
NoduleMNIST3D: 3D CT volumes (Binary: Malignant vs. Benign nodules).
BloodMNIST: Blood cell microscopy (Multi-class: 8 cell types).
MNIST: Handwritten digits (Multi-class: 0-9) as a non-medical control.

C. Adversarial Attack Generation
Two white-box gradient-based attacks were applied with varying perturbation bounds ( $\epsilon$ ):

Projected Gradient Descent (PGD): An iterative, strong attack (30 steps) considered a universal first-order adversary.
Fast Gradient Sign Method (FGSM): A single-step, computationally efficient attack.

D. Evaluation Metrics & Interpretability

Performance: Accuracy and Area Under the Curve (AUC) under increasing $\epsilon$ .
Latent Space Analysis: Used t-SNE to visualize feature embeddings. A "perturbation drift" metric (mean Euclidean distance between original and adversarial embeddings) quantified feature stability.
Attention Analysis: Used Grad-CAM to visualize model focus. The Intersection over Union (IoU) of the top 20% activated pixels between original and adversarial images measured attention stability.

3. Key Contributions

Systematic Medical Benchmark: First comprehensive evaluation of CapsNet adversarial robustness specifically within a medical imaging context, comparing them against state-of-the-art CNNs and ViTs.
Architectural Superiority: Demonstrated that CapsNets possess intrinsic robustness, maintaining high performance under strong perturbations where CNNs and ViTs fail.
Routing Mechanism Insight: Validated that Bayes-Pearson routing (BP-CapsNet) offers superior robustness over standard dynamic routing, likely due to its ability to suppress noisy capsule correlations.
Explainability of Robustness: Provided mechanistic evidence via latent space and Grad-CAM analyses, showing that CapsNets maintain stable feature representations and attention maps even when inputs are perturbed.

4. Key Results

A. Adversarial Robustness Performance

CapsNets vs. Baselines: CapsNets (both DR and BP) significantly outperformed ResNets and MedViT across all datasets.
- At a moderate PGD perturbation ( $\epsilon = 0.032$ ), BP-CapsNet maintained AUCs between 0.856–0.987 across medical datasets.
- In contrast, ResNet-18, ResNet-50, and MedViT saw AUCs drop drastically to ranges of 0.289–0.712, 0.305–0.652, and 0.275–0.678, respectively.
FGSM vs. PGD: While all models were more robust to FGSM than PGD, CapsNets still maintained a clear margin. BP-CapsNet showed the most consistent performance, followed by DR-CapsNet.
Visual Perturbation Threshold: To degrade a CapsNet's performance below an AUC of 0.50, the adversarial perturbations had to be substantially visible and heavily distorted. Conversely, CNNs and ViTs failed under imperceptible perturbations.

B. Interpretability Findings

Latent Space Stability: CapsNets exhibited minimal "perturbation drift."
- BP-CapsNet drift: < 0.02.
- DR-CapsNet drift: < 0.25.
- ResNet/MedViT drift: Up to 0.64.
- Implication: CapsNets preserve the geometric structure of class clusters even when inputs are attacked.
Attention Consistency (Grad-CAM):
- CapsNets maintained high IoU similarity (0.738–0.932 for BP-CapsNet) between original and adversarial attention maps.
- CNNs and ViTs showed attention maps that shifted away from clinically relevant regions (IoU similarity as low as 0.085–0.447), indicating they relied on spurious correlations easily broken by noise.

5. Significance and Conclusion

Clinical Reliability: The study suggests that CapsNets are a more reliable alternative for clinical deployment where model stability against data shifts or potential adversarial attacks is critical.
Architectural Implications: The results challenge the dominance of CNNs and ViTs in medical imaging, highlighting that the vector-based representation and routing mechanisms of CapsNets provide a natural defense against gradient-based attacks.
Future Directions: The authors recommend further investigation into CapsNets for other medical tasks (segmentation, reconstruction) and the exploration of other attack vectors beyond gradient-based methods.

Conclusion: Capsule Networks, particularly those utilizing Bayes-Pearson routing, exhibit intrinsic adversarial robustness superior to current standard architectures (ResNets, MedViT) in medical image classification, driven by their ability to maintain stable feature embeddings and attention mechanisms under perturbation.

Adversarial Robustness of Capsule Networks for Medical Image Classification

1. Problem Statement

2. Methodology

3. Key Contributions

4. Key Results

5. Significance and Conclusion

More like this

A case report on gendered biases in a Finnish healthcare AI assistant

An End-to-End Synthetic Oncology Clinical Trial Framework Integrating Radiographic Response, Circulating Tumor DNA, Safety, and Survival for Decision-Oriented Clinical Data Science

Who is leading medical AI? A systematic review and scientometric analysis of chest x-ray research

High-Throughput Observational Evidence Generation Using Linked Electronic Health Record and Claims Data

Perception of Safety in Behavioral Health Crisis Units among Patients and Care Partners versus Artificial Intelligence (AI): A Multimethod Study