Manifold-Preserving Superpixel Hierarchies and Embeddings for the Exploration of High-Dimensional Images

Imagine you have a massive library containing millions of books. But these aren't normal books; every single page in every book is written in a different, complex language, and each page has hundreds of unique ingredients mixed into the ink. This is what scientists call a high-dimensional image. It's not just a picture of a cat; it's a picture where every tiny dot (pixel) holds a secret code about temperature, chemical composition, or protein levels.

The problem? Trying to look at all these millions of dots at once is impossible for the human brain. It's like trying to drink from a firehose.

The Old Way: The "Blurry Map" vs. The "Scattered Puzzle"

To make sense of this data, scientists usually try to shrink it down into a 2D map (like a flat drawing) so we can see patterns.

The "Blurry Map" (Image Pyramids): Imagine taking a photo and blurring it, then blurring it again, until you just see big blobs of color. This helps you see the general shape of the picture, but you lose all the secret chemical codes. You know where things are, but not what they are.
The "Scattered Puzzle" (Standard Data Maps): Imagine taking all the secret codes from the books and sorting them into groups based on how similar the ink is. This is great for understanding the chemistry, but it destroys the picture. A group of "red ink" pages might end up next to each other on the map, even if they were pages from the very beginning and very end of the book, miles apart in the original story.

The Result: You can't easily look at a specific region in the picture and see what's happening chemically there, because the map has scattered the pieces of that region all over the place.

The New Solution: The "Smart Neighborhood" Hierarchy

This paper introduces a clever new way to organize the data called Manifold-Preserving Superpixel Hierarchies. Let's break that down with a simple analogy.

1. The "Superpixel" (The Neighborhood Block)

Instead of looking at individual pixels (dots), the computer groups them into superpixels. Think of these as "neighborhood blocks."

In a normal photo, a neighborhood block might be a group of houses that look similar (same color roof, same brick).
In this new method, a "block" is a group of pixels that are neighbors in the picture AND neighbors in the secret code. They are close together physically and they share similar chemical properties.

2. The "Random Walk" (The Neighborhood Tour)

How does the computer know which pixels belong in the same "block"? It uses a trick called a Random Walk.

Imagine a person standing on a pixel. They take a random step to a nearby pixel, then another, and another.
If they keep wandering around and mostly stay in the same "chemical neighborhood," that pixel belongs to that group.
If they quickly wander off into a totally different chemical zone, they are in a different group.
This ensures that the groups respect the complex, winding shape of the data (the "manifold"), not just simple straight-line distances.

3. The "Hierarchy" (The Zoom Lens)

This is the best part. The computer builds a family tree of these neighborhoods.

Level 1 (The Details): Tiny neighborhoods, just a few pixels wide.
Level 2 (The Blocks): These tiny neighborhoods merge into larger blocks.
Level 3 (The Districts): The blocks merge into huge districts.
Level 4 (The City): Everything merges into one big overview.

Because the computer built these groups based on both the picture layout and the secret codes, the "City" view still makes sense. If you zoom in on a specific "District" in the map, you are looking at a specific, coherent area of the original image.

Why This Matters: The "Drill-Down" Experience

Think of this like exploring a city with a magical map.

Old Maps: If you zoomed in on a specific park, the map might suddenly show you a forest from a different country because the data was sorted only by tree type, not location.
This New Map: If you zoom in on the park, the map stays focused on that park. You can see the general layout of the city, then zoom into a specific neighborhood, then zoom into a single house, and the computer always knows exactly where you are in the original image.

Real-World Examples from the Paper

The authors tested this on two very different types of "libraries":

Satellite Photos (Hyperspectral Imaging):
- Imagine looking at a farm from space. You want to find a specific type of corn that is sick.
- With the old method, finding the sick corn might require looking at thousands of scattered dots.
- With this new method, the computer groups the sick corn into a single, clean "block." You can zoom out to see the whole farm, then zoom in to see exactly which field has the sick corn, all without losing the connection between the map and the photo.
Microscope Photos of Cells (CyCIF):
- Imagine looking at a slice of skin with 50 different colored markers showing where different proteins are.
- The new method groups cells that look similar and are close together.
- It can automatically highlight a specific type of immune cell (like a "blue dot" in the data) and show you exactly where they are clustered in the tissue, helping doctors understand how cancer interacts with the immune system.

The Bottom Line

This paper solves a frustrating problem: How do we look at a picture and its complex data at the same time without getting lost?

By building a "smart hierarchy" that respects both the shape of the image and the complexity of the data, the authors created a tool that lets scientists zoom in and out seamlessly. It's like having a map that never loses its way, allowing researchers to explore massive, complex datasets with the same ease as looking at a regular photograph.

1. Problem Statement

High-dimensional images (e.g., hyperspectral imaging, mass cytometry, CyCIF) contain a high-dimensional attribute vector for every pixel. Exploring these datasets typically involves two spaces: the image space (spatial layout) and the attribute space (feature vectors).

Current Limitations:
- Flat Dimensionality Reduction (DR): Methods like t-SNE and UMAP struggle with millions of pixels due to computational limits and lack spatial coherence.
- Hierarchical DR: Existing hierarchical methods (e.g., HSNE, HiPP) construct hierarchies based only on attribute similarity. They ignore the spatial layout of the image.
- The Mismatch: In existing hierarchical methods, a single "landmark" in the embedding often represents pixels scattered across the entire image, while a coherent spatial region in the image might be represented by multiple, disconnected landmarks. This disconnect makes it difficult to explore regions of interest (ROIs) consistently across both image and attribute spaces.

2. Methodology

The authors propose a Manifold-Preserving Superpixel Hierarchy that couples image space and attribute space. The method consists of four main stages:

A. Graph Construction (Attribute Space)

Instead of using Euclidean distance, the method constructs a k-nearest neighbor (kNN) graph ( $G$ ) in the high-dimensional attribute space to approximate the underlying data manifold.

Connectivity: Vertices are pixels; edges connect the $k$ most similar pixels based on attribute vectors.
Symmetrization: The directed kNN graph is symmetrized and connected (using a Minimum Spanning Tree) to ensure a single connected component.

B. Manifold-Aware Similarity via Random Walks

To define a robust similarity measure that respects the manifold structure (avoiding "shortcuts" common in shortest-path methods), the authors use random walks on graph $G$ .

Feature Extraction: For each vertex, $\omega$ random walks of length $\lambda$ are performed.
Transition Probabilities: The walks generate a feature vector (transition probability distribution) representing the local neighborhood structure.
Similarity Metric: The similarity between two vertices (or superpixels) is calculated using the Bhattacharyya Coefficient (BC), which measures the overlap between their random walk feature distributions.
- $BC = \sum \sqrt{T(i, k) \cdot T(j, k)}$
- This metric is robust to noise and captures non-linear manifold structures better than simple Euclidean distance.

C. Superpixel Hierarchy Construction

The hierarchy is built bottom-up using a modified Borůvka's algorithm:

Level 0: Start with individual pixels.
Merging: For each superpixel, identify spatial neighbors (in the image graph $I$ ). Merge the neighbor with the highest Bhattacharyya Coefficient.
Constraint: If a superpixel has no spatial neighbors with a non-zero similarity (BC=0), it is not merged at that level (preventing forced, meaningless merges).
Feature Aggregation: When merging superpixels, their random walk feature vectors (rows in the transition matrix $T$ ) are summed and re-normalized. This allows the hierarchy to be built without re-running random walks at every level, significantly improving efficiency.

D. Hierarchical Embedding

Once the hierarchy is built, embeddings are generated for each level:

Distance Metric: The Bhattacharyya distance ( $d_{Bhat} = -\ln(BC)$ ) is used as the input distance for standard DR algorithms (t-SNE or UMAP).
Subset Refinement: Users can select a region in the embedding or image. The system retrieves the constituent superpixels from the lower level and re-embeds only that subset, allowing for "zoom-in" exploration while maintaining spatial coherence.

3. Key Contributions

Image-Guided Hierarchy: The first hierarchical embedding method that explicitly incorporates the spatial layout of pixels (via superpixels) while constructing the hierarchy based on high-dimensional attribute manifolds.
Manifold-Preserving Similarity: Introduction of a random-walk-based similarity metric (Bhattacharyya Coefficient on transition probabilities) specifically designed for merging superpixels in high-dimensional spaces, avoiding the pitfalls of shortest-path geodesics.
Unified Workflow: A single-step workflow that enables consistent exploration of high-dimensional data in both image space and attribute space, solving the "scattered landmark" problem found in HSNE and similar methods.
Implementation: The method is implemented as a standalone library and integrated into the ManiVault framework for interactive visualization.

4. Results and Validation

The method was validated using two real-world datasets and a quantitative evaluation:

Use Case 1: Hyperspectral Satellite Imaging (Indian Pines)

Comparison: Compared against HSNE (Hierarchical t-SNE).
Finding: To represent the same Region of Interest (ROI), the proposed method required significantly fewer landmarks (326 superpixels vs. 1,402 landmarks in HSNE) at the same abstraction level.
Benefit: The HSNE landmarks were scattered across the image, whereas the superpixel hierarchy maintained spatial compactness, allowing for clearer cluster identification and faster embedding computation.

Use Case 2: Cyclic Immunofluorescence (CyCIF) Tissue Imaging

Application: Analyzing protein abundance in cancerous skin tissue.
Finding: The superpixel hierarchy successfully segmented individual cells and cell clusters (e.g., regulatory T-cells expressing FOXP3) without prior cell segmentation.
Benefit: The hierarchy revealed biological structures (e.g., dermal-epidermal junction, blood vessels) at different abstraction levels, demonstrating the potential to combine segmentation and exploration into a single workflow.

Quantitative Evaluation

Metrics: Undersegmentation Error (UE) and Explained Variation (EV) compared against FH, ERS, SLIC, and BB methods.
Outcome: The proposed method (SPH) achieved competitive or superior performance in Explained Variation (AEV) and comparable performance in Undersegmentation Error (AUE) compared to state-of-the-art superpixel methods, despite being optimized for manifold preservation rather than just segmentation.

5. Significance

Bridging the Gap: This work solves a critical bottleneck in visual analytics for high-dimensional imaging: the disconnect between spatial regions and their attribute abstractions.
Scalability: By using superpixels and aggregating features rather than re-computing random walks at every level, the method scales efficiently to millions of pixels.
Interpretability: The resulting embeddings are more interpretable for domain experts (e.g., biologists, geoscientists) because selecting a cluster in the embedding corresponds directly to a contiguous region in the original image.
Future Impact: The method paves the way for "Focus+Context" exploration in high-dimensional images, where users can seamlessly zoom from global overviews to fine-grained cellular or spectral details without losing spatial context.