Learning Unified Distance Metric for Heterogeneous Attribute Data Clustering

This paper proposes a novel, parameter-free Heterogeneous Attribute Reconstruction and Representation (HARR) learning paradigm that unifies numerical and categorical attributes into homogeneous spaces with learnable metrics to effectively adapt to various clustering tasks while guaranteeing convergence.

Yiqun Zhang, Mingjie Zhao, Yizhou Chen, Yang Lu, Yiu-ming Cheung

Published 2026-03-06
📖 4 min read☕ Coffee break read

Imagine you are a detective trying to solve a mystery by grouping suspects into teams based on their descriptions. You have two very different types of clues:

  1. The "Numbers" Clues: Things like "Height" or "Income." These are easy. If someone is 6 feet tall and another is 5 feet, you know exactly how far apart they are. It's a straight line on a ruler.
  2. The "Labels" Clues: Things like "Job Title" (Doctor vs. Lawyer) or "Favorite Color." These are tricky. Is a "Doctor" closer to a "Lawyer" or a "Nurse"? There is no ruler for this. A "Red" shirt isn't "halfway" between a "Blue" and a "Green" shirt.

The Problem:
Most old-school detective tools (clustering algorithms) are bad at mixing these clues. They either try to force the "Labels" into a fake ruler (which loses the true meaning) or they treat the "Numbers" and "Labels" as completely separate worlds that never talk to each other. This makes it hard to find the right teams.

The Solution: The "Universal Translator" (HARR)
This paper introduces a new method called HARR (Heterogeneous Attribute Reconstruction and Representation). Think of it as a Universal Translator that turns all your messy clues into a single, easy-to-understand language.

Here is how it works, using simple analogies:

1. The "Shadow Projection" Trick

Imagine you have a complex 3D sculpture (a categorical attribute like "Job Title"). You can't measure it with a simple ruler.

  • The Old Way: You take a photo of it and say, "It's just a blob." (This is like One-Hot Encoding, which treats every job as equally different from every other job).
  • The HARR Way: Instead of looking at the whole sculpture at once, you shine a light on it from every possible angle and look at the shadows it casts on a flat wall.
    • You compare "Doctor" vs. "Lawyer" and see how their shadows differ.
    • You compare "Doctor" vs. "Nurse" and see their shadows.
    • By looking at all these 2D shadows (projections), you can map the complex 3D shape onto a simple, straight line (a ruler) without losing the details.

Now, "Job Title" isn't a fuzzy concept anymore; it's a set of numbers on a ruler, just like "Income." Suddenly, the computer can compare a Doctor's income to their job title on the same scale!

2. The "Smart Weight" System

Once everything is on the same ruler, the computer needs to decide which clues matter most.

  • The Old Way: You might guess that "Income" is important and "Job Title" is not, or you might give them equal weight.
  • The HARR Way: The system acts like a smart coach. It tries grouping the suspects. If "Income" helps separate the teams well, the coach says, "Great! Give Income a bigger megaphone!" If "Job Title" is confusing the teams, the coach says, "Quiet down, Job Title."
  • It does this automatically while it searches for the best groups, constantly adjusting the volume (weights) of each clue until the teams are perfectly formed.

3. Two Coaches (HARR-V and HARR-M)

The paper proposes two slightly different versions of this coach:

  • HARR-V (The Generalist): Gives one overall volume setting for each clue across the whole room. "Income is loud for everyone."
  • HARR-M (The Specialist): Gives specific volume settings for each clue per team. "Income is loud for Team A, but Job Title is loud for Team B." This is like having a coach who knows that for one group of suspects, money matters most, but for another group, their profession is the key.

Why is this a Big Deal?

  • No More Guessing: You don't need to manually tune knobs or guess how to measure "Job Title." The system figures it out.
  • It's Fast: Even though it does a lot of math (projecting shadows), it converges (finishes the job) very quickly.
  • It Works Everywhere: Whether you are grouping customers for marketing, patients for medical diagnosis, or students for school projects, this method handles the mix of numbers and words better than previous tools.

In a Nutshell:
This paper teaches computers how to stop treating "Numbers" and "Words" as enemies. By turning words into "shadows" on a ruler and letting the computer learn which clues are most important for the specific group it's building, it finds hidden patterns that other methods miss. It's like finally having a detective who can understand both the alibi (numbers) and the motive (words) perfectly.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →