Class Visualizations and Activation Atlases for Enhancing Interpretability in Deep Learning-Based Computational Pathology

Imagine you have a brilliant, super-smart robot pathologist. It can look at a tiny slide of human tissue under a microscope and tell you, with incredible accuracy, if a patient has cancer, what kind of cancer it is, or even predict how they will respond to treatment. It's like a detective that never sleeps and has read every medical textbook ever written.

But here's the problem: The robot is a "black box."

You ask it, "Why did you think this is cancer?" and it just says, "Because my math says so." It doesn't explain what it saw. It doesn't point to the specific cells or patterns that made it reach that conclusion. This is scary for doctors because, in medicine, you need to trust the diagnosis.

This paper is about opening the black box to see how the robot's brain actually works. The researchers didn't just ask the robot to explain one specific slide; they asked, "What does your brain think a 'cancer cell' looks like in general?"

Here is the breakdown of their investigation using simple analogies:

1. The Two Tools: "Dreaming" and "Mapping"

The researchers used two special techniques to peek inside the robot's brain.

Class Visualization (The "Dreaming" Tool):
Imagine you ask the robot, "Show me what a 'Lymphocyte' (a type of immune cell) looks like in your mind."
The robot starts with a blank, static screen and slowly "dreams" an image into existence. It keeps tweaking the pixels until the image triggers the strongest possible signal for "Lymphocyte."
- The Result: The robot "dreams" up a picture of a dense cluster of small, dark nuclei. It looks very much like what a real pathologist sees. This proves the robot isn't just guessing; it has learned the actual visual patterns of the disease.
Activation Atlases (The "Map" Tool):
Imagine the robot's brain is a giant, complex city. Every neighborhood in this city represents a different concept (e.g., "muscle," "fat," "cancer").
The researchers built a map of this city. They took thousands of real tissue samples, saw which "neighborhoods" in the robot's brain lit up, and then reconstructed what those neighborhoods look like.
- The Result: They created a giant grid of images. Some areas of the map are clearly "Fat Tissue," others are "Muscle," and some are "Cancer." It's like a Google Maps for the robot's internal thoughts.

2. The Big Discovery: The "Blurry" Zones

The researchers tested this on two types of tasks:

Easy Task: Distinguishing between very different things (like Fat vs. Muscle vs. Normal Colon).
Hard Task: Distinguishing between very similar things (like Colon Cancer vs. Rectal Cancer, or different types of lung cancer).

What they found:

In the Easy Task: The robot's "dreams" and "maps" were crystal clear. The neighborhoods were distinct. If you showed a human pathologist the robot's "dream" of fat tissue, they would instantly say, "Yes, that's fat."
In the Hard Task: The map got messy. The neighborhoods for "Colon Cancer" and "Rectal Cancer" started to bleed into each other. The robot's "dreams" of these two cancers looked almost identical.

Why is this important?
The researchers realized something profound: The robot's confusion wasn't a bug; it was a feature.
Even human pathologists struggle to tell the difference between these specific cancer types just by looking at a slide. The robot's "blurry" map perfectly mirrored the human experts' own uncertainty. The robot wasn't failing; it was accurately reflecting the messy, complex reality of biology.

3. The "Human-in-the-Loop" Test

To prove their tools worked, they didn't just trust the computer. They hired four real human pathologists.

They showed the humans the robot's "dreamed" images.
They asked the humans: "What do you think this is?"
They measured how much the humans agreed with each other.

The Result:
When the robot's "map" was clear, the humans agreed on what they saw. When the robot's "map" was blurry (because the cancers are biologically similar), the humans also disagreed.
This means the robot's internal "map" is a reliable mirror of human medical knowledge. If the robot is confused, it's likely because the disease itself is confusing.

4. Why This Matters for the Future

Think of this like a flight simulator for doctors.

Before: We had a pilot (the AI) who could fly the plane (diagnose cancer) perfectly, but we didn't know how they were flying it. We were scared to let them take the controls.
Now: This paper gives us a window into the cockpit. We can see the pilot's mental map. We can see where the map is clear and where it gets foggy.

The Takeaway:
This research builds a bridge of trust between humans and AI in medicine. By visualizing what the AI "sees," doctors can verify that the AI is looking at the right things (like cell shapes and tissue structures) and not just memorizing random patterns. It turns the AI from a mysterious oracle into a transparent partner that doctors can interrogate, understand, and ultimately trust with patient lives.

In short: They taught the AI to draw what it's thinking, and by looking at those drawings, we realized the AI thinks just like a human doctor—seeing clear patterns where they exist, and admitting confusion where the biology is tricky.

Here is a detailed technical summary of the paper "Class Visualizations and Activation Atlases for Enhancing Interpretability in Deep Learning-Based Computational Pathology."

1. Problem Statement

While transformer-based foundation models (e.g., Vision Transformers or ViTs) have achieved state-of-the-art performance in computational pathology for predicting molecular biomarkers and clinical outcomes from Hematoxylin and Eosin (H&E) stained Whole Slide Images (WSIs), their internal decision-making processes remain opaque ("black boxes").

Existing Explainable AI (XAI) methods in pathology are predominantly instance-level (e.g., saliency maps, Grad-CAM), which highlight specific image regions contributing to a single prediction. However, these do not systematically characterize class-level or concept-level morphological features, nor do they reveal how models organize shared features, handle overlapping classes, or structure higher-order concepts. Furthermore, interpretability techniques developed for Convolutional Neural Networks (CNNs) on natural images do not directly transfer to Vision Transformers, which encode spatial and contextual information differently. There is a lack of systematic evaluation of feature visualization methods (specifically Class Visualizations and Activation Atlases) for transformer-based histopathology models.

2. Methodology

The authors developed a concept-level feature visualization framework to probe the internal representations of a ViT-based foundation model (UNI) trained on H&E patches.

Datasets

NCT-CRC-HE-100K: 100,000 patches from colorectal cancer slides, annotated into 9 distinct tissue classes (e.g., adipose, lymphocytes, tumor epithelium, stroma).
TCGA-derived: Over 1.6 million patches from 32 cancer types, curated into subsets with varying label granularity (TCGA-5, TCGA-8, TCGA-11) to test performance across different levels of morphological overlap (e.g., distinguishing colon vs. rectum adenocarcinoma).

Model Architecture

Backbone: The frozen UNI (a general-purpose pathology foundation model).
Head: A trainable linear classification layer trained with cross-entropy loss.
Training: Models were trained using 5-fold cross-validation. Visualizations were generated from the checkpoint with the lowest validation loss.

Visualization Techniques

Class Visualizations (CVs):
- Generated by optimizing a synthetic input image to maximize the logit of a specific target class.
- Purpose: To reveal prototypical morphological patterns that the model associates with a specific class.
Activation Atlases (AAs):
- Generated by capturing high-dimensional activations from specific transformer layers (focusing on the cls token).
- Activations are embedded into 2D space using t-SNE, partitioned into a grid (10x10 for NCT, 20x20 for TCGA), and averaged.
- Feature Inversion: Synthetic images are reconstructed from these aggregated activation vectors to visualize the "concept" represented by that region of the feature space.
- Layer Analysis: AAs were generated across all 24 layers of the UNI model to observe the evolution of feature complexity from low-level structures to high-level concepts.

Evaluation Framework

Since generated images lack a single "ground truth" (they represent model concepts, not real patient data), the authors evaluated them using:

Expert Annotation: Four pathologists independently annotated real scans, CVs, and AA cells.
Metrics:
- Inter-observer Agreement: Fleiss' $\kappa$ , Cohen's $\kappa$ , and Krippendorff's $\alpha$ .
- Quantitative Surrogates: Attribution-based scores, Mahalanobis Distance (MD), Learned Perceptual Image Patch Similarity (LPIPS), and DreamSim.
Hypothesis: If the visualization method is valid, the level of agreement among pathologists on generated images should mirror the agreement on real images for the same task complexity.

3. Key Contributions

First Systematic Evaluation of CVs/AAs in Pathology: The study applies and evaluates these concept-level visualization techniques specifically for transformer-based foundation models in histopathology, a domain where they have been underexplored.
Concept-Level Interpretability: Moves beyond instance-level explanations to map the morphological manifolds learned by the model, revealing how concepts are organized, separated, or overlapped in the latent space.
Validation via Expert Ambiguity: Establishes a novel framework where the "validity" of a visualization is measured by how well it reflects human expert ambiguity. If a model struggles to separate two classes (low AA separability), human experts should also show low agreement on the generated images.
Layer-Wise Analysis: Demonstrates how morphological concepts evolve through the depth of a Vision Transformer, showing a transition from low-level features to specialized, high-level tissue representations.

4. Key Results

Class Visualizations (CVs)

Tissue Level (NCT): CVs preserved recognizability for morphologically distinct tissues (e.g., adipose, lymphocytes). Inter-annotator agreement dropped from $\kappa = 0.75$ (real scans) to $\kappa = 0.31$ (CVs), indicating some loss of fidelity but retention of core concepts.
Cancer Subclass Level (TCGA): Separability decreased significantly for overlapping classes (e.g., colon vs. rectum adenocarcinoma). Agreement was low ( $\kappa \approx 0.11$ at subclass level), and annotators showed high variability, often misclassifying morphologically similar entities.
Conclusion: CVs effectively capture prototypical features but struggle with fine-grained distinctions where morphological overlap is high.

Activation Atlases (AAs)

Layer Dependence: Early layers encoded low-level features with poor separability. Intermediate layers (e.g., Layer 14) showed coherent, well-separated concept regions. Deeper layers became highly specialized but sometimes fragmented.
Tissue vs. Cancer:
- Coarse Levels: AAs for broad categories (e.g., "Adenocarcinoma") showed high agreement ( $\kappa = 0.82$ ) and clear separation.
- Fine Levels: AAs for specific subclasses showed dispersion and overlap, mirroring the low agreement seen in real images ( $\kappa = 0.11$ ).
Metric Alignment:
- Attribution-based scores approximated inter-pathologist variability well in low-complexity settings.
- Perceptual Metrics (LPIPS, DreamSim): Showed moderate alignment but were sensitive to the feature extractor used.
- Mahalanobis Distance: Performed poorly, failing to capture semantic structure due to violations of Gaussian assumptions in high-dimensional spaces.

Core Finding

The separability of the activation atlas closely tracked expert agreement on real images. When pathologists found it difficult to distinguish between two cancer subtypes on real slides, the model's internal representations (AAs) also showed overlap and ambiguity. This suggests that the model's "confusion" reflects intrinsic pathological complexity rather than model failure or visualization artifacts.

5. Significance and Implications

Trust and Deployment: By demonstrating that transformer models organize morphological concepts in ways that align with human pathological reasoning (including human ambiguity), the study provides a framework for building trust in AI systems.
Beyond "Black Box": The approach shifts interpretability from "why did the model predict X for this image?" to "how does the model conceptually organize the space of all possible tissues?"
Limitations of Current Metrics: The study highlights that standard perceptual metrics (like LPIPS) and distance measures are insufficient for pathology; domain-specific metrics grounded in expert judgment are needed.
Future Directions: The framework suggests that concept-level visualization can be used to identify under-recognized features, validate model behavior against genotype-phenotype correlations, and guide the development of more robust, interpretable foundation models.

In summary, this paper establishes that concept-level feature visualization is a powerful tool for interrogating transformer-based pathology models, revealing that their internal representations faithfully mirror the structural complexity and inherent ambiguities of human pathological diagnosis.