Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks

This paper proposes a two-phase iterative algorithm that leverages a discrete-time causal model approximation to identify latent subprocesses and recover causal structures in multivariate Hawkes processes, even when systems are only partially observed.

Songyao Jin, Biwei Huang

Published 2026-03-03
📖 5 min read🧠 Deep dive

Imagine you are a detective trying to solve a mystery in a bustling city. The city is full of events: people shouting, cars honking, lights flashing, and phones ringing. In the world of data science, these are called events, and the system that models how one event triggers another is called a Hawkes Process.

Most existing detective tools assume that if you see two things happening together, you can figure out which one caused the other. For example, if you see a siren (Event A) and then a car crash (Event B), you might guess the siren caused the crash.

But here's the problem: In the real world, you can't see everything. There are hidden players. Maybe a drunk driver (a latent subprocess) was weaving through traffic, causing the siren to go off and causing the crash. If you only look at the siren and the crash, you might wrongly conclude the siren caused the crash. The drunk driver is the "hidden confounder" messing up your investigation.

This paper presents a new, super-smart detective method that can find these hidden players and figure out the real story, even when they are invisible.

The Big Idea: Turning a Movie into a Flipbook

The authors realized that looking at a continuous stream of events (like a smooth movie) is hard when things are hidden. So, they proposed a clever trick: turn the movie into a flipbook.

They chop time into tiny, tiny slices (like frames in a flipbook). Instead of watching the smooth flow of time, they look at the "count" of events in each slice.

  • The Magic: They proved that if you make these slices small enough, the complex, continuous math of the Hawkes process turns into a simple, linear math problem (like a standard algebra equation).
  • Why it helps: Once it's a simple algebra problem, they can use a specific type of "math magnifying glass" called Rank Tests to see patterns that are invisible to other methods.

The "Rank Test" Magnifying Glass

Imagine you are looking at a group of people in a room.

  • Scenario A: Everyone is acting independently. The "math rank" of their behavior is high (lots of unique patterns).
  • Scenario B: Everyone is secretly following a hidden leader. Even though you can't see the leader, the followers' movements are perfectly synchronized. This synchronization creates a "low rank" pattern in the math.

The authors' method looks for these low-rank patterns. If the math shows that two observed events (like the siren and the crash) are "too synchronized" to be explained by each other directly, the method screams: "There must be a hidden third party causing both!"

The Two-Phase Detective Algorithm

The paper proposes a two-step game to solve the mystery:

Phase 1: The "Who Caused Whom?" Game
The detective looks at the visible events (the siren, the crash, the flashing lights). Using the flipbook math, they ask: "If I know what happened in the past, can I predict what happens next?"

  • If yes, they draw a line connecting them.
  • They keep doing this until they map out all the connections between the visible things.

Phase 2: The "Hidden Ghost" Hunt
Sometimes, the visible things don't make sense. Two things are acting weirdly synchronized, but there's no direct line between them.

  • The detective says, "Aha! There's a ghost here."
  • They create a virtual "Ghost" node in their map.
  • They then treat this Ghost as if it were a real person and go back to Phase 1 to see what caused the Ghost and what the Ghost caused.

They repeat this loop (Phase 1 → Phase 2 → Phase 1) until the whole map is complete, revealing both the visible actors and the invisible ghosts pulling the strings.

Real-World Application: The Cellular Network

To prove it works, they tested this on a real dataset from a cellular network.

  • The Setup: A network of 55 cell towers. Sometimes, alarms go off (e.g., "Signal Lost," "Overload," "Hardware Failure").
  • The Mystery: Some alarms were missing from the data (hidden).
  • The Result: Their method successfully identified that a specific missing alarm (Alarm #7) was the hidden cause of two other visible alarms. It reconstructed the true chain of events, whereas other methods got confused and drew wrong lines.

Why This Matters

In the past, if you had hidden variables, you had to guess how many there were or where they were. This paper says: "We don't need to guess."

By turning the continuous flow of time into a discrete flipbook and using math to spot "hidden synchronization," this method can:

  1. Find the invisible: Detect hidden causes without being told they exist.
  2. Fix the lies: Stop you from blaming the wrong event for a problem.
  3. Map the truth: Rebuild the entire causal network, visible and invisible, just by watching the data.

In short: It's like having a detective who can see the invisible puppeteer pulling the strings, ensuring you never blame the puppet for the dance.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →