Adversarial Robustness of Capsule Networks for Medical Image Classification

This study demonstrates that Capsule Networks exhibit superior intrinsic adversarial robustness and stable feature representations compared to CNNs and Vision Transformers across multiple medical imaging datasets, highlighting their potential as reliable alternatives for clinical diagnostic applications.

Srinivasan, A., Sritharan, D. V., Chadha, S., Fu, D., Hossain, J. O., Breuer, G. A., Aneja, S.

Published 2026-03-10
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are hiring a team of experts to diagnose diseases from medical scans like X-rays and blood tests. You have three types of experts:

  1. The Traditionalists (CNNs): These are the current industry standard. They are like brilliant detectives who have memorized millions of patterns. They are great at their job, but they have a fatal flaw: they are easily tricked by optical illusions.
  2. The Modern Visionaries (ViTs): These are the new, high-tech experts using advanced AI. They are powerful but, surprisingly, they also fall for the same optical illusions as the Traditionalists.
  3. The Architects (Capsule Networks): These are the new kids on the block. Instead of just memorizing patterns, they understand how things fit together in 3D space. They are like a master builder who knows that a roof belongs on top of walls, not on the side.

The Problem: The "Invisible Ink" Attack
In the world of AI, there is a scary concept called an adversarial attack. Imagine someone takes a photo of a healthy lung and adds a tiny, invisible speck of "digital noise" to it. To the human eye, the photo looks exactly the same. But to the Traditionalist and Visionary experts, that tiny speck is a magic spell that makes them scream, "This is cancer!" when it's actually healthy.

This is dangerous in a hospital. If a doctor relies on an AI that can be tricked by invisible ink, patients could get misdiagnosed.

The Experiment: The Stress Test
The authors of this paper decided to put all three types of experts through a "stress test." They took four different medical datasets (pneumonia X-rays, breast ultrasounds, lung nodules, and blood cells) and tried to trick the AI models with these invisible attacks. They used two methods:

  • The "Whisper" (FGSM): A quick, single-shot trick.
  • The "Sledgehammer" (PGD): A stronger, repeated attack that tries every possible angle to break the model.

The Results: The Architects Win
Here is what happened:

  • The Traditionalists and Visionaries crumbled. As soon as the "invisible ink" was applied, their confidence dropped. They started making wild guesses. It was like a detective who, upon seeing a tiny smudge on a fingerprint, immediately forgot how to read fingerprints entirely.
  • The Architects (Capsule Networks) stood firm. Even when the AI was hit with the strongest "sledgehammer" attacks, they kept their cool. They still correctly identified the pneumonia, the tumors, and the blood cells.

Why? The "GPS" vs. The "Map"
The paper explains why the Architects were so tough using some cool visual tests:

  • The "Focus" Test (Grad-CAM): Imagine the AI has a flashlight that shows what part of the image it is looking at.
    • When the Traditionalists were attacked, their flashlight went crazy. It stopped looking at the tumor and started shining on the edge of the X-ray or a random shadow. They lost their focus.
    • The Architects kept their flashlight steady on the actual disease, even when the image was being attacked. They knew exactly where to look.
  • The "Memory" Test (Latent Space): Imagine the AI organizes its knowledge in a giant library.
    • When attacked, the Traditionalists' library got messy. The "cancer" books got mixed up with the "healthy" books.
    • The Architects kept their library perfectly organized. The "cancer" books stayed in the cancer section, and the "healthy" books stayed in the healthy section, no matter how hard they were shaken.

The Secret Weapon: Bayes-Pearson Routing
The paper also found that one specific type of Architect (called BP-CapsNet) was the strongest of all. Think of this as the Architect having a special "noise-canceling headset." When the attack tried to confuse the AI with bad data, this headset filtered out the noise and let the AI focus only on the clear, important signals.

The Bottom Line
This study tells us that if we want AI to be safe for hospitals, we can't just rely on the current popular models (CNNs and ViTs) because they are too easily fooled by invisible tricks.

Capsule Networks are like the sturdy, reliable experts who understand the structure of the world, not just the surface patterns. They are much harder to trick, making them a much safer bet for saving lives in the future.

In short: If you want an AI doctor that won't be fooled by a magic trick, hire the Architect, not the Traditionalist.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →