Maintaining Leiden Communities in Large Dynamic Graphs

This paper introduces HIT-Leiden, a novel algorithm that efficiently maintains high-quality Leiden communities in large dynamic graphs by bounding the scope of updates through hierarchical structures, achieving speedups of up to five orders of magnitude over existing methods while meeting stringent production latency requirements.

Chunxu Lin, Yumao Xie, Yixiang Fang, Yongmin Hu, Yingqian Hu, Chen Cheng

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine you are the mayor of a massive, bustling city with billions of residents (nodes) and trillions of friendships or interactions (edges). Your job is to organize this city into neighborhoods (communities) so that people who hang out together are grouped together. This helps you understand the city, find troublemakers, or recommend new friends.

In the world of data science, this is called Community Detection. One of the best ways to do this is using an algorithm called Leiden. It's like a very smart, meticulous urban planner who ensures that every neighborhood is a tight-knit, connected group of people.

The Problem: The City Never Stops Changing

The problem is that this city is dynamic. Every second, new friendships form, old ones break, and new people move in.

  • The Old Way: Every time a few people change their minds, the old urban planners (existing algorithms) would panic. They would say, "Oh no, the map is wrong! Let's tear down the entire city and rebuild it from scratch!"
  • The Result: This takes hours or even days. By the time they finish, the city has changed again. The "neighborhoods" are outdated, and the city's services (like fraud detection or recommendations) are slow and useless.

The Solution: HIT-Leiden (The Smart, Incremental Planner)

The authors of this paper, working with ByteDance (the company behind TikTok and Douyin), created a new planner called HIT-Leiden.

Instead of rebuilding the whole city, HIT-Leiden acts like a highly efficient, surgical team. It only fixes the specific blocks that are affected by the changes.

Here is how it works, using a creative analogy:

1. The "Tree" Structure (The Hierarchy)

Imagine the city isn't just a flat map. It's built like a Russian Nesting Doll or a family tree.

  • Level 1: Individual houses (people).
  • Level 2: Small clusters of houses (sub-neighborhoods).
  • Level 3: Entire districts.
  • Level 4: The whole city.

Old planners tried to fix the whole city every time one house changed. HIT-Leiden understands that if a house changes, you only need to check the immediate block, then maybe the district, but you don't need to worry about the whole country unless the change is huge. It keeps a hierarchical map in its head, so it knows exactly where to look.

2. The Three-Step "Surgery"

When a change happens (like a new edge/road is built), HIT-Leiden performs three quick steps:

  • Step A: The "Move" (Inc-movement)

    • Analogy: If a new road connects two neighborhoods, a few people might want to move to the other side to be closer to their friends.
    • Action: HIT-Leiden checks only the people living near that new road. If moving them makes the neighborhood "happier" (more connected), it moves them. It ignores the rest of the city.
  • Step B: The "Refine" (Inc-refinement)

    • Analogy: Sometimes, when people move, a neighborhood might accidentally split in half, leaving a group of people isolated (like a cul-de-sac with no exit). The Leiden algorithm demands that every neighborhood must be connected.
    • Action: HIT-Leiden uses a special "connectivity tool" (like a dynamic tree index) to instantly see if a neighborhood has split. If it has, it immediately reorganizes the split groups into new, valid neighborhoods. It does this only for the affected area.
  • Step C: The "Update" (Inc-aggregation)

    • Analogy: Once the small blocks are fixed, the planner updates the map of the larger districts.
    • Action: It tells the "District Manager" that a few houses changed, so the district's statistics need a tiny tweak. It doesn't redraw the whole map; it just updates the summary.

Why is this a Big Deal?

The paper shows that HIT-Leiden is insanely fast.

  • Old methods: If you update 100,000 connections, they might take hours to recalculate.
  • HIT-Leiden: It does the same job in milliseconds.
  • The Speedup: It is up to 100,000 times faster (five orders of magnitude) than the competition.

Real-World Impact

The authors tested this in the real world at ByteDance:

  1. Fraud Detection: Imagine a ring of scammers trying to hide. If the graph of their connections changes, you need to know immediately if a new group is forming. HIT-Leiden updates the groups in seconds, allowing the system to catch fraud instantly.
  2. Recommendations: If you like a video, the system looks at your "neighborhood" of similar users. If the graph updates, HIT-Leiden ensures your recommendations stay fresh without lagging.
  3. AI Search (Graph-RAG): When AI tries to answer questions using a massive database, it groups information into "chunks." HIT-Leiden keeps these chunks updated so the AI always has the latest info, saving huge amounts of computing power and money.

The Bottom Line

Think of HIT-Leiden as the difference between demolishing a house to fix a leaky faucet (the old way) and sending a plumber to just fix the faucet (HIT-Leiden).

It keeps the "neighborhoods" of the internet organized, connected, and up-to-date, allowing massive applications like TikTok to run smoothly even when billions of things are changing every second.