Hierarchical Classification for Improved Histopathology Image Analysis

Imagine you are a detective trying to solve a crime, but instead of looking at a single clue, you are handed a massive, high-resolution photograph of an entire city (the Whole-Slide Image or WSI). Your job is to identify exactly what kind of crime happened and where.

In the world of medical pathology, doctors do this with tissue samples from biopsies. They look at the "city" (the tissue slide) to diagnose diseases like cancer.

The Problem: The "Flat" Detective

Traditionally, AI detectives have been trained to look at the whole picture and shout out one answer.

The Old Way: "Is this a tumor or not?" (Yes/No). Or, "Is it Type A, B, C, or D?"
The Flaw: This ignores the natural way humans think. A doctor doesn't just guess a specific cancer type out of thin air. They first say, "Okay, this is definitely a tumor" (the Coarse level), and then they narrow it down: "And specifically, it's a poorly-differentiated tumor" (the Fine level).

Existing AI methods were like a detective who tries to guess the specific suspect without first establishing that a crime actually occurred. They ignored the hierarchy, making it hard to distinguish between very similar-looking diseases.

The Solution: HiClass (The Smart Detective)

The authors of this paper, Keunho Byeon and his team, built a new AI system called HiClass. Think of HiClass as a detective who works with a two-step strategy and a team of specialists.

1. The Two-Step Strategy (Hierarchical Classification)

Instead of guessing the final answer immediately, HiClass breaks the job down:

Step 1 (The Broad Brush): It first looks at the big picture to decide the general category (e.g., "Benign" vs. "Cancer").
Step 2 (The Fine Detail): Once it knows it's a "Cancer," it zooms in to figure out the specific type (e.g., "Tubular Adenocarcinoma").

2. The Secret Sauce: Bidirectional Feature Integration

This is the coolest part. Imagine two detectives working on the same case:

Detective A is an expert on the big picture (Coarse).
Detective B is an expert on tiny, specific details (Fine).

In old systems, they worked in separate rooms. In HiClass, they are in the same room talking to each other:

Detective A tells Detective B: "Hey, this is definitely a tumor, so you don't need to worry about looking for benign features."
Detective B tells Detective A: "I see a specific pattern here that helps confirm it's a tumor, not just inflammation."

They share information back and forth (Bidirectional Integration). This helps the "Big Picture" detective understand the details, and helps the "Detail" detective understand the context. They don't overwrite each other; they just help each other see better.

3. The Special Rules (Tailored Loss Functions)

To make sure these two detectives stay on the same page, the system uses special "rules of the game" (mathematical penalties called Loss Functions):

The Consistency Rule: If Detective A says "It's a tumor," but Detective B says "It's a harmless polyp," the system gets a penalty. They must agree on the logic.
The Grouping Rule: If there are 14 different types of tumors, the system is taught to keep the similar-looking ones close together in its "mind" (feature space) and push the different types far apart. It's like organizing a library: all "Mystery" books go on one shelf, and within that shelf, the specific authors are arranged neatly.
The Focus Rule: When trying to identify a specific tumor type, the system is told, "Don't even think about the other 13 types; just compare these specific ones." This reduces confusion.

The Results: A Better Diagnosis

The team tested this on 4,673 stomach biopsy slides.

The Old Way: Good at saying "It's cancer," but often confused about which cancer.
HiClass: It got the general category right 85% of the time and the specific type right 68% of the time.

More importantly, it was the most consistent performer. Other AI models were great at one thing but terrible at the other. HiClass was good at both because it respected the natural hierarchy of how doctors diagnose diseases.

The Takeaway

This paper is about teaching AI to think more like a human doctor. Instead of forcing the computer to memorize a giant list of 14 unrelated diseases, HiClass teaches it to climb a ladder: first identify the broad category, then step down to the specific details, while constantly checking that the steps make sense together.

It's the difference between a student memorizing a dictionary and a student who understands how words relate to each other in a sentence. The result is a smarter, more reliable medical AI.

1. Problem Statement

Whole-slide image (WSI) analysis is critical for pathology diagnosis, yet current deep learning approaches predominantly rely on flat classification. This approach treats all class labels as independent, ignoring the inherent hierarchical structure of disease diagnosis (e.g., distinguishing between "Benign" vs. "Tumor" at a coarse level, and then further sub-categorizing tumors by differentiation grade at a fine level).

Key Challenges Identified:

Loss of Context: Flat classification fails to leverage the semantic relationships between broad categories and specific subtypes.
Fine-Grained Difficulty: Fine-grained classification (e.g., specific tumor grades) is significantly harder than coarse-grained classification due to high inter-class similarity and data scarcity for specific subtypes.
Inefficiency of Existing Hierarchical Methods: Previous attempts at hierarchical classification in pathology (e.g., HMIL) often lack robust mechanisms to effectively exchange information between coarse and fine levels or fail to optimize the feature space specifically for hierarchical constraints.

2. Methodology: HiClass Framework

The authors propose HiClass, a hierarchical classification framework built upon a Multiple Instance Learning (MIL) architecture. The system consists of three main components:

A. Model Architecture

Patch-level Encoder ( $E$ ):
- Splits the WSI into $512 \times 512$ patches.
- Uses UNI (a pre-trained, self-supervised general-purpose model trained on 100M+ patches) to extract $1024$-dimensional feature vectors for each patch.
Feature Aggregator ( $A$ ):
- Utilizes attention-based pooling (following CLAM) to aggregate patch-level features into a single representative slide-level feature vector ( $\mathbb{R}^{512}$ ).
Hierarchical Classifier ( $H$ ):
- Bidirectional Feature Integration: The aggregated feature is split into coarse ( $v_c$ $v_{c}$ ) and fine ( $v_f$ $v_{f}$ ) vectors. These are then augmented by concatenating the other level's features (with gradient blocking to prevent bias).
  - $v_c' = v_c \circ G(v_f)$
  - $v_f' = v_f \circ G(v_c)$
  - Goal: Coarse features gain fine-grained details; fine features gain high-level context.
- Projection & Classification: The augmented vectors pass through linear projection heads and classification heads to generate logits for both coarse and fine classes simultaneously.

B. Tailored Loss Functions

To enforce hierarchical consistency and improve discrimination, HiClass employs a composite loss function: $L = L_{CE} + L_{Con} + L_{Int} + L_{GCE}$ .

Cross-Entropy Loss ( $L_{CE}$ ): Standard supervised loss applied independently to both coarse and fine tasks.
Hierarchical Consistency Loss ( $L_{Con}$ ):
- Based on Jensen-Shannon Divergence (JSD).
- Aligns the feature representations of the most confident coarse and fine predictions.
- Purpose: Ensures semantic consistency (e.g., preventing a "Cancer" coarse prediction from being paired with a "Gastritis" fine prediction).
Intra- and Inter-class Distance Loss ( $L_{Int}$ ):
- Uses KL Divergence with a margin.
- Maximizes distance between fine-grained classes belonging to different coarse categories.
- Minimizes distance between fine-grained classes belonging to the same coarse category.
- Purpose: Creates a structured feature space where fine-grained clusters naturally group around their coarse parents.
Group-wise Cross-Entropy Loss ( $L_{GCE}$ ):
- Restricts the fine-grained prediction space to only those classes within the predicted coarse category.
- Purpose: Reduces the number of competing logits during softmax, sharpening class boundaries and mimicking the pathologist's diagnostic reasoning (broad category first, then subtype).

3. Key Contributions

Bidirectional Feature Integration: A novel mechanism that facilitates information exchange between coarse and fine feature representations without direct weight updates that could bias one level over the other.
Hierarchy-Aware Loss Suite: The introduction of three specific loss functions ( $L_{Con}, L_{Int}, L_{GCE}$ ) designed to structurally organize the feature space, enforce semantic consistency, and improve intra-group discrimination.
Comprehensive Evaluation: A rigorous evaluation on a large-scale gastric biopsy dataset, demonstrating that hierarchical learning outperforms both flat classification and existing hierarchical baselines.

4. Experimental Results

Dataset:

Source: 4,673 gastric endoscopic biopsy slides from The Catholic University of Korea.
Classes: 4 Coarse-grained classes (Benign, Cancer, Dysplasia, Gastritis) and 14 Fine-grained classes.

Performance Comparison (Table 2):
HiClass achieved state-of-the-art performance across all metrics compared to baselines (MaxMIL, MeanMIL, CLAM-SB/MB, TransMIL, S4MIL, and Chang et al.):

Coarse-grained: 85.10% Accuracy, 0.8610 F1-macro.
Fine-grained: 68.68% Accuracy, 0.5220 F1-macro.
Note: HiClass outperformed the second-best method (S4MIL) by a significant margin, particularly in the difficult fine-grained task.

Ablation Study (Table 3):

Bidirectional Integration: Proven superior to unidirectional (Fine $\to$ Coarse or Coarse $\to$ Fine) or no integration.
Loss Function Synergy: No single loss function was dominant; the combination of all three tailored losses ( $L_{Con}, L_{Int}, L_{GCE}$ ) was required to achieve peak performance. Removing any single loss resulted in a performance drop of 2–4% in accuracy.

5. Significance

Clinical Relevance: The framework mirrors the actual diagnostic workflow of pathologists, who first identify broad pathology types before determining specific subtypes.
Robustness: By leveraging hierarchical constraints, the model becomes more robust to class imbalance and high inter-class similarity, which are common challenges in histopathology.
Generalizability: The approach is generic and can be applied to various pathology tasks beyond gastric biopsy, offering a new standard for WSI analysis that moves beyond flat classification.

In conclusion, HiClass demonstrates that explicitly modeling the class hierarchy through bidirectional feature exchange and specialized loss functions significantly enhances the accuracy and reliability of automated histopathology image analysis.