Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks

Imagine you are a detective trying to solve a mystery in a bustling city. The city is full of events: people shouting, cars honking, lights flashing, and phones ringing. In the world of data science, these are called events, and the system that models how one event triggers another is called a Hawkes Process.

Most existing detective tools assume that if you see two things happening together, you can figure out which one caused the other. For example, if you see a siren (Event A) and then a car crash (Event B), you might guess the siren caused the crash.

But here's the problem: In the real world, you can't see everything. There are hidden players. Maybe a drunk driver (a latent subprocess) was weaving through traffic, causing the siren to go off and causing the crash. If you only look at the siren and the crash, you might wrongly conclude the siren caused the crash. The drunk driver is the "hidden confounder" messing up your investigation.

This paper presents a new, super-smart detective method that can find these hidden players and figure out the real story, even when they are invisible.

The Big Idea: Turning a Movie into a Flipbook

The authors realized that looking at a continuous stream of events (like a smooth movie) is hard when things are hidden. So, they proposed a clever trick: turn the movie into a flipbook.

They chop time into tiny, tiny slices (like frames in a flipbook). Instead of watching the smooth flow of time, they look at the "count" of events in each slice.

The Magic: They proved that if you make these slices small enough, the complex, continuous math of the Hawkes process turns into a simple, linear math problem (like a standard algebra equation).
Why it helps: Once it's a simple algebra problem, they can use a specific type of "math magnifying glass" called Rank Tests to see patterns that are invisible to other methods.

The "Rank Test" Magnifying Glass

Imagine you are looking at a group of people in a room.

Scenario A: Everyone is acting independently. The "math rank" of their behavior is high (lots of unique patterns).
Scenario B: Everyone is secretly following a hidden leader. Even though you can't see the leader, the followers' movements are perfectly synchronized. This synchronization creates a "low rank" pattern in the math.

The authors' method looks for these low-rank patterns. If the math shows that two observed events (like the siren and the crash) are "too synchronized" to be explained by each other directly, the method screams: "There must be a hidden third party causing both!"

The Two-Phase Detective Algorithm

The paper proposes a two-step game to solve the mystery:

Phase 1: The "Who Caused Whom?" Game
The detective looks at the visible events (the siren, the crash, the flashing lights). Using the flipbook math, they ask: "If I know what happened in the past, can I predict what happens next?"

If yes, they draw a line connecting them.
They keep doing this until they map out all the connections between the visible things.

Phase 2: The "Hidden Ghost" Hunt
Sometimes, the visible things don't make sense. Two things are acting weirdly synchronized, but there's no direct line between them.

The detective says, "Aha! There's a ghost here."
They create a virtual "Ghost" node in their map.
They then treat this Ghost as if it were a real person and go back to Phase 1 to see what caused the Ghost and what the Ghost caused.

They repeat this loop (Phase 1 → Phase 2 → Phase 1) until the whole map is complete, revealing both the visible actors and the invisible ghosts pulling the strings.

Real-World Application: The Cellular Network

To prove it works, they tested this on a real dataset from a cellular network.

The Setup: A network of 55 cell towers. Sometimes, alarms go off (e.g., "Signal Lost," "Overload," "Hardware Failure").
The Mystery: Some alarms were missing from the data (hidden).
The Result: Their method successfully identified that a specific missing alarm (Alarm #7) was the hidden cause of two other visible alarms. It reconstructed the true chain of events, whereas other methods got confused and drew wrong lines.

Why This Matters

In the past, if you had hidden variables, you had to guess how many there were or where they were. This paper says: "We don't need to guess."

By turning the continuous flow of time into a discrete flipbook and using math to spot "hidden synchronization," this method can:

Find the invisible: Detect hidden causes without being told they exist.
Fix the lies: Stop you from blaming the wrong event for a problem.
Map the truth: Rebuild the entire causal network, visible and invisible, just by watching the data.

In short: It's like having a detective who can see the invisible puppeteer pulling the strings, ensuring you never blame the puppet for the dance.

1. Problem Statement

The paper addresses the challenge of causal structure learning in Multivariate Hawkes Processes (MHPs) under partial observability.

Context: MHPs are widely used to model self-exciting temporal events in domains like neuroscience, finance, and social networks.
Limitation of Existing Work: Current methods for learning Hawkes structures (e.g., maximum likelihood estimation, Granger causality) generally assume causal sufficiency, meaning all relevant subprocesses (event sequences) are fully observed.
The Core Problem: In real-world scenarios, many subprocesses are latent (unobserved). These latent subprocesses often act as confounders, creating spurious causal edges between observed subprocesses and obscuring the true causal dynamics.
Gap: Existing methods cannot identify the existence, number, or location of these latent subprocesses without prior knowledge. Furthermore, standard causal discovery methods for time series often fail because they assume weak autocorrelation or exogenous latents, whereas Hawkes processes exhibit dense cross-lag dependencies and endogenous latent subprocesses.

2. Methodology

The authors propose a principled framework that bridges continuous-time Hawkes processes with discrete-time linear causal models to enable the identification of latent structures.

A. Theoretical Foundation: Continuous-to-Discrete Mapping

Key Insight: As the time interval ( $\Delta$ ) shrinks, a continuous-time Multivariate Hawkes Process can be represented as a discrete-time linear autoregressive model.
Theorem 4.1: The paper proves that the discretized event count $N_i^{(n)}$ in a time window is a linear combination of lagged counts from other subprocesses plus noise.
$N_i^{(n)} = \sum_{j} \sum_{k} \theta_{ij}^{(k)} N_j^{(n-k)} + \epsilon_i^{(n)} + \theta_i^{(0)}$
This transformation allows the authors to apply statistical tools designed for linear structural models to Hawkes data.

B. Identifiability via Rank Constraints

The core of the methodology relies on analyzing the rank of cross-covariance matrices of the observed discretized variables.

Window Causal Graph: The authors define a "window graph" where nodes represent counts in specific time lags. Unlike the summary graph (which may have cycles), the window graph is a Directed Acyclic Graph (DAG).
D-separation and Rank: Leveraging the connection between d-separation and matrix rank (Sullivant et al., 2010), the authors establish that if a set of variables $C_v$ d-separates $A_v$ and $B_v$ , the rank of the cross-covariance matrix between $(A_v \cup C_v)$ and $(B_v \cup C_v)$ equals $|C_v|$ .
Detecting Latent Confounders:
- If observed subprocesses $O_1$ and $O_2$ share a latent confounder $L_1$ , the rank of their cross-covariance with other variables will be lower than expected based solely on observed parents.
- Symmetric Path Condition (Definition 4.4): To ensure identifiability, the paper introduces a condition where the latent confounder connects to observed effects via paths of equal length consisting only of intermediate latent nodes (no self-loops). Under this condition, the latent influence manifests as a specific rank deficiency (e.g., rank $= 2m + 1$ instead of $2m$ ).

C. Two-Phase Iterative Algorithm

The authors propose an algorithm (Algorithm 1) that alternates between two phases to reconstruct the full causal graph:

Phase I (Causal Relation Identification):
- Iteratively identifies the "parent-cause set" for each subprocess (observed or previously discovered latent) using rank tests on cross-covariances (Theorems 4.3 and 4.7).
- Uses observed surrogates (specific observed effects of a latent node) to represent latent nodes during the testing of other relationships.
Phase II (Latent Subprocess Discovery):
- When Phase I stalls, the algorithm searches for new latent confounders by testing pairs of subprocesses for rank deficiencies indicative of a shared latent parent (Theorems 4.5 and 4.8).
- If a latent confounder is detected, it is added to the active set, and the algorithm returns to Phase I to re-evaluate relationships.

3. Key Contributions

First Principled Framework for Latent Hawkes: The paper provides the first framework to identify latent subprocesses and recover causal structures in continuous-time event sequences without prior knowledge of the existence or number of latent variables.
Discrete-Time Representation: It establishes a rigorous theoretical link between MHPs and linear autoregressive models, deriving necessary and sufficient conditions for identifiability based on rank constraints.
Handling Endogenous Latents: Unlike previous time-series methods that assume exogenous latents, this method handles endogenous latent subprocesses (where observed nodes can cause latent nodes).
Algorithmic Innovation: A two-phase iterative algorithm that uses rank tests on cross-covariance matrices to alternately discover causal links and infer new latent nodes, guaranteeing identifiability under specific path constraints.

4. Experimental Results

The method was evaluated on both synthetic and real-world datasets.

Synthetic Data:
- Baselines: Compared against likelihood-based Hawkes methods (SHP, THP, NPHC), rank-based latent methods for i.i.d. data (Hier. Rank, RLCD), and time-series baselines (LPCMCI).
- Performance: The proposed method consistently achieved higher F1-scores across various graph structures, including those with complex latent confounder networks (e.g., latent nodes causing other latent nodes).
- Robustness: The method remained robust to variations in time discretization ( $\Delta$ ) and moderate violations of the rank-faithfulness assumption.
Real-World Data:
- Dataset: A cellular network alarm dataset (18 alarm types, 55 devices).
- Setup: A specific subgraph was analyzed where one alarm type (Alarm 7) was manually treated as latent.
- Outcome: The method successfully recovered the latent subprocess (Alarm 7) and its causal influence on observed alarms (Alarm 1 and 3), outperforming all baselines (F1-score of 0.76 vs. ~0.49 for the next best).

5. Significance

Theoretical Advancement: It resolves a long-standing gap in causal discovery for point processes by moving beyond the assumption of causal sufficiency.
Practical Impact: In fields like neuroscience (where recording all neurons is impossible) and finance (where unobserved market factors drive events), this method allows researchers to infer hidden drivers of system dynamics, preventing incorrect conclusions about causality.
Methodological Shift: It shifts the paradigm for Hawkes process learning from purely likelihood-based fitting (which struggles with latent variables) to a structural, rank-based approach that leverages the specific algebraic properties of discretized Hawkes processes.

In summary, this paper offers a robust, theoretically grounded solution for uncovering the "hidden" causal architecture of complex event-driven systems, significantly advancing the state-of-the-art in causal discovery for partially observed temporal data.