Interpretable Medical Image Classification using Prototype Learning and Privileged Information

Imagine you are a doctor looking at an X-ray of a lung. You see a small spot (a nodule) and you have to decide: Is it harmless, or is it dangerous (malignant)?

Usually, modern AI acts like a super-smart but silent wizard. It looks at the image, makes a decision, and says, "This is dangerous." But it won't tell you why. It's like a magic 8-ball that just gives you an answer without explaining the logic. In medicine, doctors need to know the "why" to trust the machine.

This paper introduces a new AI called Proto-Caps. Think of it not as a silent wizard, but as a teaching assistant who shows its work.

Here is how it works, broken down into simple concepts:

1. The "Teacher's Secret Notes" (Privileged Information)

Imagine you are training a student to identify different types of fruit.

Standard AI: You show them a picture of an apple and say, "This is an apple." They memorize the picture.
Proto-Caps: You show them the picture, but you also give them a secret cheat sheet that says, "This apple is red, round, and has a smooth skin."

In this paper, the "secret cheat sheet" is called Privileged Information. During training, the AI is allowed to see detailed notes written by human radiologists about the lung nodules (e.g., "This nodule is very round," "The edges are jagged," "It has a spiky texture"). The AI learns to connect these specific features to the final diagnosis.

Crucially, when the AI is actually used on a new patient later, it doesn't need these notes anymore. It has learned the logic internally.

2. The "Capsule" Backpacks

Instead of just looking at the whole image as one big blob, this AI breaks the image down into small "backpacks" called Capsules.

Each backpack is assigned a specific job. One backpack only looks at roundness. Another only looks at spikiness. Another only looks at texture.
This is like having a team of specialists. You don't ask one person to judge the whole fruit; you ask the "Roundness Expert" to check the shape, and the "Spikiness Expert" to check the edges.

3. The "Photo Album" (Prototype Learning)

This is the coolest part. When the AI makes a decision, it doesn't just guess; it shows you its evidence.

Imagine the AI has a photo album (a library of examples) for every feature it learned.

If the AI thinks a nodule is "spiky," it pulls up a picture from its album of the most spiky, perfect example of a spiky nodule it has ever seen.
It then compares the new patient's image to that "perfect example."

Why is this helpful?
If the AI says, "This is dangerous because it's spiky," but the picture it shows you looks nothing like the patient's nodule, you know something is wrong. It's like a student saying, "I got an A because I studied Chapter 5," but then showing you a picture of Chapter 1. You immediately know the student is confused.

4. The Results: Smarter and Clearer

The researchers tested this on a huge database of lung scans (LIDC-IDRI).

Accuracy: It got the diagnosis right 93% of the time, which is better than almost all other AI models (even the ones that aren't explainable).
Trust: Because it shows the "Photo Album" examples, a human doctor can look at it and say, "Ah, I see why it thinks that," or "Wait, that example doesn't match, let me double-check."

The Big Takeaway

Usually, people think you have to choose between High Performance (being right) and Explainability (understanding why).

Old way: Be right but silent, or be chatty but less accurate.
Proto-Caps way: Be right AND chatty.

By using "secret notes" during training and showing "photo album" examples during testing, this new method proves that you can build an AI that is both a top-tier doctor and a transparent teacher. It doesn't just give an answer; it gives you the reasoning and the visual proof to back it up.

1. Problem Statement

Deep learning models in medical imaging often achieve high predictive performance but suffer from a "black box" nature, making it difficult for clinicians to trust or understand model decisions. While Explainable AI (XAI) attempts to address this, many methods rely on post-hoc explanations which can be error-prone or decoupled from the model's core logic.

The authors identify two specific gaps in current research:

Privileged Information (PI): While using extra data available only during training (e.g., radiologist-defined nodule attributes) can boost performance and interpretability, existing methods often rely on predicted attribute scores that cannot be visually validated.
Prototype Learning: Existing prototype networks can show similar training examples to justify a prediction, but they often fail to explain why a specific feature led to a decision (i.e., they lack attribute-specific reasoning).

The goal is to create a model that is explainable-by-design, combining high performance with the ability to visually validate predictions against human-defined attributes.

2. Methodology: Proto-Caps

The authors propose Proto-Caps, a novel architecture that integrates Capsule Networks, Privileged Information, and Prototype Learning.

Core Architecture

The model is built upon a backbone capsule network with three main components:

Backbone Capsule Network:
- Input: 32x32 2D image slices (2D convolutions were chosen over 3D for efficiency with negligible performance loss).
- Layers: A convolutional layer (256 kernels) $\rightarrow$ Primary Capsule Layer (8 capsules) $\rightarrow$ Dense Capsule Layer (8 capsules, one for each predefined nodule attribute).
- Output: 16-dimensional vectors representing high-level features for specific attributes (e.g., sphericity, spiculation, margin).
Prediction Heads:
- Target Head: Predicts the malignancy score (classification target) using the aggregated capsule vectors. The loss function uses Kullback-Leibler (KL) divergence to model the uncertainty and inter-observer agreement of radiologists.
- Reconstruction Head: A decoder network reconstructs the nodule segmentation mask to ensure the capsules capture relevant spatial features.
- Attribute Head: Instead of inferring attributes from capsule vector lengths (as in standard CapsNets), this head uses fully connected layers to predict specific attribute scores.
Prototype Layer (The Innovation):
- To validate attribute predictions, the model learns prototypes (representative training samples) for each attribute class.
- Training: Two prototypes are learned per attribute class (resulting in 8–12 prototypes per attribute).
- Loss Functions:
  - Cluster Loss ( $L_{clu}$ ): Minimizes the distance between a sample's capsule vector and the nearest prototype of the correct attribute class.
  - Separation Loss ( $L_{sep}$ ): Maximizes the distance to prototypes of incorrect attribute classes.
- Inference: The predicted attribute value is determined by the ground truth of the closest prototype to the input's capsule vector, rather than relying solely on the dense layer prediction. This allows for visual verification: if the closest prototype looks nothing like the input, the prediction is suspect.

Training Strategy

The model is trained using a weighted sum of losses:
$L = L_{mal} + \lambda_{recon} \cdot L_{recon} + L_{attr} + 0.125 \cdot (L_{clu} + 0.1 \cdot L_{sep})$

Privileged Information: Radiologist-annotated attributes (sphericity, margin, spiculation, etc.) are used as ground truth during training but are not required during inference.
Semi-supervised Learning: A random binary mask is applied to the attribute loss, allowing the model to learn even when attribute annotations are missing for some samples.

3. Key Contributions

Novel Integration: First known method to combine Privileged Information and Prototype Learning specifically for medical image classification, enhancing both performance and explainability.
Attribute-Specific Prototypes: Unlike class-specific prototypes, Proto-Caps uses prototypes for attributes (e.g., "spiculated" vs. "smooth"), allowing for granular, case-based reasoning.
Visual Validation: The system provides natural images (prototypes) that justify attribute predictions, enabling radiologists to visually verify if the model's reasoning aligns with medical reality.
State-of-the-Art Performance: The model outperforms existing explainable and non-explainable methods on the LIDC-IDRI dataset.

4. Results

The model was evaluated on the LIDC-IDRI dataset (1,018 CT scans with lung nodules).

Performance:
- Malignancy Prediction: Achieved 93.0% Within-1-Accuracy, surpassing the previous best explainable method (X-Caps at 90.4%) by over 6%.
- Attribute Prediction: Achieved high accuracy across all 8 attributes (e.g., 99.8% for Subtlety, 95.4% for Sphericity).
Robustness to Data Scarcity:
- The model maintained high performance even when attribute annotations were available for only 10% of the training data (92.4% malignancy accuracy).
- Even with 0% attribute annotations, the model achieved 92.4% accuracy, though it lost its interpretability features.
Ablation Study:
- Removing prototypes from inference (using them only for training regularization) yielded slightly higher accuracy (93.9%) but sacrificed the ability to visually validate decisions.
- This suggests prototypes act as a regularizer during training while providing essential interpretability during inference.
Qualitative Analysis:
- The system successfully identified potential misclassifications. For example, if the input nodule looked significantly different from the closest "spiculated" prototype, the system flagged the prediction as suspicious, aiding human review.

5. Significance

Trust in AI: By providing visual evidence (prototypes) for why a specific attribute was predicted, the model bridges the gap between high-performance deep learning and clinical trust.
Efficiency: The method demonstrates that high performance does not require massive amounts of expensive, fully annotated attribute data; it is robust even with sparse privileged information.
Clinical Utility: The "explainable-by-design" approach avoids the pitfalls of post-hoc explanations. It allows radiologists to perform a "sanity check" on the AI's reasoning by comparing the input image to the retrieved prototypes.
Future Directions: The authors suggest this framework could be applied to other high-risk domains and extended to use other forms of privileged information (e.g., medical reports) or 3D capsule networks.

In conclusion, Proto-Caps represents a significant step forward in medical AI by proving that interpretability and high accuracy are not mutually exclusive, but can be achieved simultaneously through a carefully designed architecture leveraging privileged information and prototype-based reasoning.

Interpretable Medical Image Classification using Prototype Learning and Privileged Information

1. The "Teacher's Secret Notes" (Privileged Information)

2. The "Capsule" Backpacks

3. The "Photo Album" (Prototype Learning)

4. The Results: Smarter and Clearer

The Big Takeaway

1. Problem Statement

2. Methodology: Proto-Caps

Core Architecture

Training Strategy

3. Key Contributions

4. Results

5. Significance

More like this

DualDynamics: Synergizing Implicit and Explicit Methods for Robust Irregular Time Series Analysis

Robot Collapse: Supply Chain Backdoor Attacks Against VLM-based Robotic Manipulation

ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis

SafePLUG: Empowering Multimodal LLMs with Pixel-Level Insight and Temporal Grounding for Traffic Accident Understanding

Advanced Assistance for Traffic Crash Analysis: An AI-Driven Multi-Agent Approach to Pre-Crash Reconstruction