Cluster-First Labelling: An Automated Pipeline for… — Plain-Language Explanation

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a librarian tasked with organizing a library that contains 100,000 books, but there's a catch: every single book is slightly different, and you have to read the cover of every single one to decide which shelf it belongs on. If you did this one by one, it would take you years.

This is exactly the problem scientists face with Histology Whole Slide Images (WSIs). These are giant, high-resolution digital photos of tissue samples (like skin, bone, or organs). A single slide can contain tens of thousands of tiny "objects" (cells, nuclei, clusters). Traditionally, a human expert has to zoom in, draw a line around every single object, and label it. It's slow, expensive, and exhausting.

This paper introduces a clever new system called "Cluster-First Labelling" that solves this by changing the workflow entirely. Here is how it works, using simple analogies:

1. The Old Way: The "One-by-One" Struggle

The Problem: Imagine trying to sort a massive pile of mixed-up LEGO bricks. The old way is to pick up every single brick, look at it, and decide: "Is this a 2x4 red brick? Is this a 1x2 blue brick?" You do this for 15,000 bricks. It takes forever.

2. The New Way: The "Group First" Strategy

The authors built an automated pipeline that acts like a super-smart, tireless robot assistant. Instead of sorting bricks one by one, it does this:

Step A: The Great Slicing (Tiling)

The giant image is chopped up into smaller, manageable puzzle pieces (tiles), like cutting a large pizza into slices so it's easier to eat.

Step B: The "Trash" Filter (Quality Control)

The robot looks at each slice. If a slice is just empty white space or blurry (like a slice of pizza with no toppings), it throws it away immediately. This saves time.

Step C: The "Shape Shifter" (Segmentation)

Using a tool called Cellpose-SAM, the robot finds every object that looks like a cell or a nucleus.

Analogy: Imagine the robot is a magic marker that draws a perfect outline around every single LEGO brick in the pile, regardless of what it is. It doesn't know what the brick is yet, but it knows where it is.

Step D: The "ID Card" Generator (Embedding)

The robot takes a picture of each outlined object and runs it through a neural network (ResNet-50). This creates a unique "ID card" (a mathematical fingerprint) for every object based on its shape and texture.

Analogy: It's like scanning every LEGO brick and giving it a barcode that says "Red, 2x4, smooth" or "Blue, 1x2, bumpy."

Step E: The "Sorter" (Clustering)

This is the magic step. The robot uses a clustering algorithm (DBSCAN) to group objects that have similar "ID cards."

Analogy: Instead of you sorting the bricks, the robot automatically piles all the "Red 2x4s" into one bin, all the "Blue 1x2s" into another, and all the "weird green pieces" into a third. It does this without being told what the bins are called.

3. The Human's New Job: The "Bin Manager"

Now, the human expert doesn't have to look at 15,000 individual bricks. They only have to look at the bins.

If the robot made a bin full of "Red 2x4s," the human just looks at a few samples, says, "Yes, that's a Red 2x4," and labels the entire bin.
The computer then instantly applies that label to every single brick in that bin.
The Result: Instead of 15,000 tasks, the human might only have 25 tasks (one for each bin). That is a 600x reduction in work.

4. Did It Work? (The Results)

The team tested this on 3,696 objects from 13 different types of tissues (human, rat, and rabbit).

The Score: The system matched human labels 96.8% of the time.
The Perfect Scores: For 7 out of the 13 tissue types, the robot got 100% agreement with the humans.
The Struggles: It had a bit of trouble with "Compact Bone" and "Skeletal Muscle."
- Why? Bone has very few cells per image (making it hard for the robot to find patterns), and muscle has many different-looking parts that look similar to the robot but are actually different to a human eye. It's like trying to sort a pile of identical-looking rocks; the robot needs a little more help there.

Why This Matters

This system is Open Source (free for everyone to use). It turns a job that used to take days of expert time into a process that takes minutes of computer time and minutes of human review.

In a nutshell:
Instead of asking a human to sort a mountain of sand grain by grain, this system asks the computer to group the sand into piles based on color and texture, and then asks the human to just name the piles. It's faster, cheaper, and surprisingly accurate.

1. Problem Statement

The Challenge of Manual Annotation:
Labeling tissue components in histology Whole Slide Images (WSIs) is extremely labor-intensive. A single WSI at 40× magnification can contain tens of thousands of structures (cells, nuclei, tissue clusters). Traditional workflows require experts to manually trace boundaries and classify every individual object, a process that can take days per slide.
The Limitation of Current Tools:
Existing tools often require per-tissue manual configuration or operate at the slide level rather than the cell level. Furthermore, filtering heterogeneous detections (e.g., distinguishing individual cells from nuclei or tightly packed clusters) at the segmentation stage is impractical without complex, domain-specific heuristics.

2. Methodology: The "Cluster-First" Paradigm

The authors propose an end-to-end, cloud-native pipeline that shifts the annotation burden from individual objects to representative clusters. The workflow consists of the following stages:

A. Preprocessing and Tiling

Input: Raw WSI files (.ndpi format).
Tiling: Images are partitioned into non-overlapping $512 \times 512$ pixel tiles.
Quality Filtering: An optional stage removes uninformative tiles (background, out-of-focus) using six metrics: edge density, bright/dark pixel ratios, intensity standard deviation, Laplacian variance (focus), and cross-channel color variance.

B. Segmentation (Cellpose-SAM)

Model: Uses Cellpose-SAM (combining Cellpose gradient-flow with a Segment Anything Model backbone).
Function: Performs instance segmentation on all morphologically distinct "cell-like" structures.
Strategy: The model does not attempt to distinguish between cell types, nuclei, or clusters at this stage. Instead, it segments everything resembling a cell, passing heterogeneous detections to the next stage.

C. Feature Extraction and Embedding

Backbone: Each segmented object is cropped and passed through a ResNet-50 pretrained on ImageNet.
Output: A 2,048-dimensional feature vector is extracted from the penultimate average-pooling layer.
Normalization: Vectors are L2-normalized.

D. Dimensionality Reduction

Technique: UMAP (Uniform Manifold Approximation and Projection).
Process: Reduces the 2,048-dimensional embeddings to 50 dimensions while preserving local and global morphological structures.

E. Clustering

Algorithm: DBSCAN (Density-Based Spatial Clustering of Applications with Noise).
Parameters:
- Neighborhood radius ( $\epsilon$ ): Estimated automatically via the knee-point of the k-nearest-neighbor distance curve.
- Minimum samples: Set to 5.
- Noise: Objects not meeting density criteria are labeled as noise (cluster -1).
Outcome: Morphologically similar objects (e.g., all nuclei, all specific cell types) are grouped into distinct clusters.

F. Human-in-the-Loop Validation

Workflow: Instead of labeling thousands of cells, a human annotator reviews a small number of representative clusters.
Interface: A web application (FastAPI) displays tiles with segmentation overlays. The annotator assigns a single label to a cluster, which propagates to all members of that cluster.
Evaluation Metric: Hungarian-Algorithm Alignment. Since unsupervised cluster IDs are arbitrary, the system computes a contingency matrix between model clusters and human labels for each tile. The Hungarian algorithm finds the optimal one-to-one mapping to maximize agreement, calculating a weighted accuracy score.

3. Key Contributions

End-to-End Automated Pipeline: A fully automated system that processes raw WSIs into per-cell cluster assignments without manual intervention during the processing phase.
Scalable Architecture: Implemented on Azure Machine Learning, supporting both sequential debugging and parallel execution across multiple GPU nodes with slide-level granularity.
Cluster-First Annotation Paradigm: A novel workflow that reduces annotation effort by orders of magnitude (e.g., labeling 25 clusters instead of 15,000 cells).
Open-Source Ecosystem: Release of the pipeline code, a companion labeling web application, and evaluation scripts under the MIT license.
Robust Evaluation: Empirical validation across 13 diverse tissue types and three species (human, rat, rabbit) using a rigorous Hungarian-alignment metric.

4. Experimental Results

Dataset: 3,696 tissue components across 13 tissue types (e.g., lung, prostate, bone, muscle) from 13 slides.
Overall Performance: The system achieved a weighted cluster–label alignment accuracy of 96.8%.
Per-Tissue Performance:
- Perfect Agreement (100%): Achieved on 7 of 13 tissue types (including Pancreas, Prostate, Cervix, Lung, and Fallopian tube).
- High Performance: 9 of 13 tissue types achieved $\ge 99\%$ accuracy.
- Challenging Cases: Compact bone and skeletal muscle showed lower accuracy (~84.0%).
  - Reasoning: Compact bone has very few cells per tile (making density estimation unreliable), while skeletal muscle contains diverse components (fibers, nuclei) that look similar in isolation but differ in spatial context, which the current model lacks.
Efficiency: The paradigm reduces the annotator's task from reviewing thousands of individual objects to reviewing tens of representative groups.

5. Significance and Impact

Scalability: The pipeline makes large-scale histology annotation feasible for educational and research purposes by removing the bottleneck of manual cell-by-cell labeling.
Generalizability: Using a single fixed configuration (no tissue-specific tuning), the system generalizes effectively across diverse species and tissue types, leveraging the power of foundation models (Cellpose-SAM) and general-purpose feature extractors (ResNet-50).
Flexibility: The system is not limited to individual cells; it handles nuclei, cell clusters, and other tissue components, grouping them naturally based on morphology.
Future Directions: The authors suggest that incorporating spatial context (analyzing the neighborhood of a cell rather than just its cropped appearance) and tissue-specific parameter tuning could further improve performance on complex tissues like bone and muscle.

In conclusion, this work presents a practical, high-accuracy solution to the "annotation bottleneck" in digital pathology, demonstrating that unsupervised morphological clustering can effectively align with human expert judgment, thereby democratizing the creation of large-scale histological datasets.

Cluster-First Labelling: An Automated Pipeline for Segmentation and Morphological Clustering in Histology Whole Slide Images