Hierarchical topological clustering

This paper proposes a hierarchical topological clustering algorithm that can be applied with any distance metric to identify clusters of arbitrary shapes and persistent outliers across diverse datasets, including images, medical, and economic data.

Original authors: Ana Carpio, Gema Duro

Published 2026-02-10
📖 4 min read☕ Coffee break read

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The "Social Network of Shapes": Understanding Hierarchical Topological Clustering

Imagine you are looking at a massive, messy crowd of people in a city square. Some people are standing in tight-knit families, some are walking in small groups of friends, and a few are lone wanderers standing far away from everyone else.

If you wanted to organize this crowd into groups, how would you do it?

  • Do you group them by how many people are in a circle?
  • Do you group them by how close they are standing?
  • What do you do with the lone wanderer? Are they just "noise" to be ignored, or are they important individuals we should study?

This paper introduces a new way to solve this problem, called Hierarchical Topological Clustering (HTC).


The Core Idea: The "Expanding Bubble" Method

Most traditional clustering methods are like a rigid manager. They might say, "Everyone must belong to one of exactly five groups," or "If you aren't in a dense crowd, you don't count." This often fails when groups have weird, curvy shapes (like a long snake of people) or when the "loners" are actually the most interesting people in the room.

The authors' new method works differently. Instead of forcing people into boxes, imagine every single person in that crowd starts blowing a bubble around themselves.

  1. The Tiny Bubble Stage: At first, the bubbles are tiny. Everyone is in their own "cluster" of one.
  2. The Growing Bubble Stage: As the bubbles grow larger, they start to touch. When two bubbles touch, those two people (and their groups) merge into one larger group.
  3. The Big Merge: Eventually, the bubbles get so big that they all touch, and the entire crowd becomes one single, giant group.

The "Magic" of this approach: By watching when these bubbles touch, we can learn the "topology" (the shape and structure) of the crowd.

  • The "Mainstream" Groups: If a group of people merges very quickly, they were already standing close together. They are a stable, dense community.
  • The "Outliers" (The Interesting Loners): If a person’s bubble has to grow massive before it finally touches anyone else, that person is an outlier. In many datasets, these aren't just "errors"—they are the most important pieces of information!

Real-World Applications: Why does this matter?

The researchers tested this "bubble" method on three very different "crowds":

1. The Medical Detective (Cancer Cells)

Imagine a battlefield where healthy cells are fighting off invading cancer cells. The cancer cells don't just form one neat circle; they spread out in "islands" and "fingers" into the healthy territory.

  • Old methods often got confused by the weird shapes and couldn't tell the difference between the main battle line and the small, dangerous "scout" groups of cancer cells.
  • HTC easily identified the main boundary and, more importantly, spotted the "islands" of cancer cells that had broken away. These are the "outliers" that doctors need to watch most closely.

2. The Quality Control Inspector (Digital Images)

Think of a photo being compressed (like a low-quality JPEG). As you compress it more, it gets blurrier.

  • HTC can look at a collection of images and automatically say: "These 20 images are high quality; these 5 are a bit blurry; and these 2 are totally broken because someone accidentally drew a black line across them." It treats the "broken" images as outliers, helping computers automatically spot defects.

3. The Economic Trendspotter (Global Trade)

Imagine looking at all the countries in Europe and how much they trade with Spain.

  • Most countries trade in similar amounts, forming a "main crowd."
  • But some countries—like France or Germany—are "super-traders." In the HTC model, their "bubbles" stay separate for a very long time because they are so much bigger and more significant than the others. HTC identifies these "power players" as the most important outliers in the economy.

Summary: The Big Takeaway

Traditional clustering is like trying to fit different-sized objects into standard square boxes. Hierarchical Topological Clustering is more like watching how things naturally connect as they grow.

By focusing on the shape of the data and the timing of how groups merge, this method doesn't just tell you what the "average" looks like—it tells you where the weird, the rare, and the most important pieces of the puzzle are hiding.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →