Federated Hierarchical Clustering with Automatic Selection of Optimal Cluster Numbers

This paper proposes Fed-kk^*-HC, a novel federated clustering framework that automatically determines the optimal number of clusters by generating client-side micro-subclusters and performing density-based hierarchical merging on the server, effectively addressing challenges related to unknown cluster counts, imbalanced cluster sizes, and privacy constraints.

Yue Zhang, Chuanlong Qiu, Xinfa Liao, Yiqun Zhang

Published 2026-03-16
📖 5 min read🧠 Deep dive

Imagine you are the mayor of a huge country, and you want to organize a national festival. You need to group millions of citizens into different "clubs" based on their hobbies (like hiking, cooking, or gaming) so you can send them the right invitations.

However, there's a catch: Privacy.

The citizens live in thousands of different towns (clients). They are very protective of their personal data and refuse to send you a list of their names and hobbies. They only agree to send you a few "summary notes" about their town's vibe.

This is the challenge of Federated Clustering. Most existing methods try to solve this by assuming:

  1. Everyone has roughly the same number of people in each hobby group (e.g., exactly 100 hikers, 100 cooks).
  2. You, the mayor, already know exactly how many clubs exist (e.g., "We need 5 clubs").

The Problem: In the real world, these assumptions are wrong. Some hobbies are huge (millions of hikers), while others are tiny (just 50 people who collect rare stamps). Also, you often don't know how many clubs actually exist until you look at the data. If you force everyone into equal-sized groups, you ruin the tiny, unique clubs.

The Solution: Fed-k*-HC (The "Smart Town Planner")

The authors of this paper propose a new method called Fed-k-HC*. Think of it as a two-step process involving "Micro-Planners" and a "Grand Architect."

Step 1: The Micro-Planners (The Clients)

Instead of sending you a messy list of everyone's hobbies, each town (client) does a quick, local organization first.

  • The Trick: They break their town down into tiny, manageable "micro-groups." Imagine a town of 1,000 people is broken into 50 tiny circles of 20 people each.
  • The Privacy Shield: They don't send you the names of the people. Instead, they create fake "dummy" people that perfectly mimic the average behavior of those 20-person circles. It's like sending a silhouette or a mannequin that looks and acts exactly like the group, but reveals no real identity.
  • The Result: You get a pile of these "dummy" representatives from every town. They are small, precise, and safe.

Step 2: The Grand Architect (The Server)

Now, you (the server) have thousands of these dummy representatives from all over the country. You need to group them into the final clubs.

  • The "Uniform Effect" Trap: Old methods would try to force these groups into equal sizes, crushing the tiny "stamp collector" groups into the giant "hiker" groups.
  • The New Approach (Hierarchical Merging): The Grand Architect uses a "Lego" strategy.
    1. Look at all the tiny dummy groups.
    2. Find the two that are most similar (e.g., two groups of stamp collectors from different towns) and snap them together.
    3. Keep snapping similar groups together, one by one.
    4. The Magic Stop: The system has a special sensor (called SNC) that watches how the groups are merging. It knows when to stop. It says, "Okay, we've merged all the stamp collectors into one big club. Now we have 5 distinct clubs. Let's stop!"

Why is this special?

  • It finds the number automatically: You don't need to guess "5 clubs." The system figures out, "Oh, there are actually 5 distinct groups here," all by itself.
  • It handles the "Small Guys": Because it starts with tiny micro-groups and merges them slowly, it doesn't accidentally swallow the small, rare clubs. It protects the minority groups.
  • One-Shot: The towns only send their dummy data once. No back-and-forth chatting. This saves time and keeps privacy tight.

A Simple Analogy: The Mystery Puzzle

Imagine you have a giant jigsaw puzzle, but the pieces are scattered in locked boxes in different houses. You can't open the boxes to see the picture.

  • Old Way: You ask each house to guess which picture they have and send you a blurry photo. If one house has a tiny piece of a rare bird and another has a huge piece of a sky, the blurry photos get mixed up, and you miss the bird.
  • Fed-k-HC Way:* Each house takes their pieces and glues them into small, tight clusters (micro-groups). They then send you a "shadow" of that cluster. You take all the shadows, and because they are small and precise, you can slowly snap the similar shadows together. Eventually, you realize, "Ah, these shadows form a bird, and those form a sky." You discover the picture and the number of distinct objects in it without ever seeing the original pieces.

The Bottom Line

This paper introduces a smarter, more flexible way to organize data in a privacy-friendly world. It stops us from forcing "one-size-fits-all" groups and allows us to discover the true number of groups, even when some are huge and some are tiny. It's like finally being able to organize a festival where the 50 stamp collectors get their own special tent, rather than being shoved into the giant hiking tent.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →