Relational event models with global covariates

Imagine you are trying to understand why people are riding bikes in Washington D.C. You have a massive list of every single bike trip taken in July 2023—about 350,000 trips. Each trip is a "story" connecting two bike stations at a specific time.

The authors of this paper are statisticians who want to figure out what drives these stories. They know that some things depend on the specific stations involved (like how far apart they are), but they also suspect that global factors—things that affect everyone at the same time, like the weather or the time of day—are huge drivers.

Here is the problem: Traditional statistical tools used for this kind of data are like a pair of glasses that only let you see the specific stations. They automatically "cancel out" the weather and time of day because, mathematically, those factors look the same for every single trip happening at that moment. It's like trying to hear a specific instrument in an orchestra while the conductor keeps turning down the volume on the whole band.

The Solution: The "Time-Shifted" Trick

To fix this, the authors invented a clever mathematical trick. Let's use an analogy:

The Analogy: The "Time-Traveling" Race

Imagine a race where runners (bike trips) start at different times.

The Old Way: You look at all the runners starting at 9:00 AM. You ask, "Why did Runner A beat Runner B?" You look at their shoes and their training (station-specific data). But you ignore the fact that it was raining at 9:00 AM, because everyone was running in the rain. The rain cancels out in the math.
The New Way (The Paper's Method): The authors say, "Let's mess with time." They take every single runner and give them a random, tiny "time shift."
- Runner A (who actually started at 9:00 AM) is now analyzed as if they started at 9:05 AM.
- Runner B (who actually started at 9:00 AM) is analyzed as if they started at 8:55 AM.

Why does this help?
Now, when the math looks at Runner A, it asks, "How did you do at 9:05 AM?" (Maybe it was sunny). When it looks at Runner B, it asks, "How did you do at 8:55 AM?" (Maybe it was still raining).

Because the runners are now being judged at different times, the weather (the global factor) no longer cancels out! The math can finally see that "Oh, people ride more when it's sunny at 9:05 AM than when it's raining at 8:55 AM."

The "Sampling" Shortcut

There's a catch. If you have 350,000 trips and millions of possible station pairs, checking every single possibility is like trying to count every grain of sand on a beach. It takes too long and crashes computers.

To solve this, the authors use a technique called Nested Case-Control Sampling.

The Analogy: Instead of interviewing every single person in a city to find out who bought a bike, you pick one person who bought a bike (the "Case") and then pick just one random person who didn't buy a bike at that exact moment (the "Control").
You compare these two. By repeating this process thousands of times, you can accurately predict the trends for the whole city without interviewing millions of people. This makes the math fast enough to run on a normal laptop.

What Did They Find?

Using this new "Time-Traveling + Sampling" method on the Washington D.C. bike data, they discovered some very intuitive but previously hard-to-quantify truths:

The Goldilocks Temperature: People love biking when it's warm, but if it gets too hot, they stop. It's not a straight line; it's a curve.
Rain is a Dealbreaker: Even a little bit of rain stops people from riding.
The Commuter Rhythm: There are huge spikes in riding at 9 AM (going to work) and 6 PM (coming home). The "rush hour" effect is massive.
The "Competition" Surprise: They looked at whether having many bike stations close together helps or hurts. Surprisingly, having a station right next to another one didn't make people ride more. In fact, it seemed to slightly lower the activity, perhaps because the area was already saturated or the stations were too close to be useful for distinct trips.

The Big Picture

This paper is a toolkit for statisticians. It gives them a way to stop ignoring the "big picture" factors (like weather, holidays, or time of day) when studying how things connect over time.

By shifting the clock slightly and sampling smartly, they turned a mathematically impossible problem into a solvable one. This means city planners can now build better bike networks, knowing exactly how the weather and the clock influence whether people will hop on a bike or stay home.

Here is a detailed technical summary of the paper "Relational event models with global covariates" by Lembo et al.

1. Problem Statement

Relational Event Models (REMs) are a standard framework for analyzing dynamic networks where interactions (events) occur between nodes over time. Traditionally, REMs rely on partial likelihood inference, which is computationally efficient for large networks. However, this approach has a critical limitation:

The Nuisance Parameter Issue: In standard partial likelihood, global effects (covariates that are time-dependent but constant across all interacting pairs, such as weather or time of day) act as nuisance parameters. They cancel out in the likelihood ratio because they affect all potential pairs equally at a given time.
Infeasibility of Full Likelihood: To estimate these global effects, one would typically need the full likelihood. However, calculating the full likelihood involves integrating the intensity function over time between events for all possible node pairs. This scales quadratically with the number of nodes ( $O(N^2)$ ), making it computationally intractable for large networks (e.g., bike-sharing systems with thousands of stations and millions of potential pairs).
Existing Approximations: Previous attempts to approximate full likelihood (e.g., assuming piecewise constant intensity) often introduce significant bias or fail to capture non-linear global dynamics accurately.

The authors aim to develop a method that allows for the efficient and consistent estimation of global covariates within the partial likelihood framework, without resorting to intractable full likelihood calculations.

2. Methodology

The proposed solution combines a time-shifted event process with nested case-control sampling to transform the problem into a form solvable by Generalized Additive Models (GAMs).

A. Time-Shifted Event Process

The core innovation is constructing a new event process, $M^e$ , by applying independent, random positive time shifts ( $H_{sr}$ ) to the original event marks (sender $s$ , receiver $r$ ).

Mechanism: If an event $(s, r)$ occurs at time $t$ in the original process, it is shifted to $t + H_{sr}$ in the new process.
Effect on Likelihood: Because each pair $(s, r)$ in the risk set receives a different random shift, the global covariates (which depend on time) are evaluated at different time points for the event and the non-events in the risk set.
Result: The global covariate terms no longer cancel out in the partial likelihood ratio, allowing them to be identified and estimated.

B. Nested Case-Control Sampling

To handle the computational cost of summing over all pairs in the risk set (which remains quadratic even with the shift), the authors apply nested case-control sampling:

For every observed event, a small number of "non-events" (pairs that did not interact at that time) are randomly sampled from the risk set.
This reduces the denominator of the likelihood from the size of the entire network to the size of the sampled risk set.

C. Degenerate Logistic Additive Model

When only one non-event is sampled per event, the resulting partial likelihood takes a specific mathematical form:
$LP S = \prod_{k=1}^n \frac{\lambda_{skrk}(t_k)}{\lambda_{skrk}(t_k) + \lambda_{s^*kr^*k}(t^*_k)}$
By substituting the intensity function (which includes smooth non-linear functions for covariates), this expression is mathematically equivalent to the likelihood of a degenerate logistic regression model (where the outcome is always "success" for the event vs. "failure" for the non-event, but the log-odds depend on the difference in covariate values).

Advantage: This allows the use of highly efficient existing software (e.g., the mgcv package in R) to fit Generalized Additive Models (GAMs). This enables the estimation of:
- Non-linear smooth effects for global covariates (e.g., temperature curves).
- Non-linear smooth effects for dyadic/node-specific covariates.
- Time-varying effects without assuming piecewise constancy.

3. Key Contributions

Theoretical Extension: The paper extends REMs to include global covariates (time-dependent but pair-invariant) within a partial likelihood framework, a feat previously considered impossible without full likelihood approximations.
Exact Inference: Unlike previous full-likelihood approximations that assume piecewise constant intensity (introducing bias), this method is exact (consistent) and does not require the piecewise constant assumption.
Computational Efficiency: By reducing the problem to a degenerate logistic additive model via time-shifting and case-control sampling, the method scales linearly with the number of events and the sample size, rather than quadratically with the number of nodes.
Flexibility: The approach supports complex, non-linear relationships for both global and local covariates using standard GAM smoothing techniques.

4. Results

Simulation Study

The authors validated the method through extensive simulations:

Sample Size & Network Size: The method accurately recovers global covariate effects as the number of events increases. Crucially, performance is robust to network size (number of nodes), confirming that the method does not suffer from the $O(N^2)$ bottleneck of full likelihood methods.
Shift Distribution: The choice of the shift distribution is critical.
- Too small shifts: The event and non-event times are too close, causing global covariate values to be identical, leading to structural zeros in the likelihood and high variance.
- Too large shifts: The risk set at the shifted time often contains only the actual event (no non-events available to sample), rendering observations uninformative.
- Optimal: Intermediate shift sizes yield the best balance of precision and information.
Comparison with Full Likelihood: Compared to the approximate full likelihood method (Stadtfeld & Block, 2017):
- The proposed method has slightly higher variance but significantly lower bias.
- The full likelihood method is >1 million times slower computationally.
- The proposed method can achieve the same variance as the full likelihood by sampling more non-events (e.g., 100 non-events), while remaining computationally feasible.

Empirical Application: Washington D.C. Bike Sharing

The method was applied to ~350,000 bike rides in Washington D.C. (July 2023).

Global Covariates:
- Temperature: Usage increases with temperature up to a point, then decreases when it becomes too hot (non-linear inverted U-shape).
- Precipitation: Strongly negative effect; rain significantly discourages usage.
- Time of Day: Clear peaks during morning (4–9 AM) and evening (around 6 PM) commutes, with low usage at night.
Dyadic/Node Covariates:
- Distance: Usage decreases as distance between stations increases, though very short distances did not show a negative effect (contrary to some expectations).
- Repetition/Reciprocity: Strong tendencies for users to repeat routes or reciprocate trips.
- Station Competition: Surprisingly, stations with closer neighbors (higher competition) showed higher usage intensity, suggesting a "negative competition" effect where dense station networks attract more traffic rather than cannibalizing it.

5. Significance

This paper represents a major advancement in the analysis of dynamic networks.

Bridging the Gap: It successfully bridges the gap between the computational efficiency of partial likelihood and the modeling richness required to capture global drivers (like weather and time) in large-scale systems.
Practical Impact: For urban planners and mobility providers, the ability to model how global factors (weather, time of day) interact with local network structures (station density, distance) provides actionable insights for infrastructure investment and fleet management.
Generalizability: The methodology is not limited to bike sharing; it is applicable to any relational event data where global temporal drivers are suspected but previously unmodelable, such as financial transactions, communication networks, or disease spread.

The code for the study is publicly available, facilitating reproducibility and further application of the method.