Causal Survival Analysis in Platform Trials with Non-Concurrent Controls

Imagine you are running a massive, high-stakes cooking competition to find the best new recipe for a soup.

The Setup: The "Platform" Kitchen

In a traditional cooking show, you have two teams: Team A makes the old recipe, and Team B makes the new one. They cook side-by-side for the whole season.

But in a Platform Trial (like the one described in this paper), the kitchen is dynamic.

Team A (The Control): They are always there, making the "standard" soup. They are the shared reference point.
Team B, C, D (The Treatments): New teams enter the kitchen at different times. Team B arrives in January, Team C in March, and Team D in June.
The Twist: When Team B is cooking, they compare their soup to Team A. When Team C arrives, they also compare their soup to Team A.

This is efficient! You don't need a new "Team A" for every new recipe. You just keep the same standard team.

The Problem: The "Time Drift"

Here is the catch: The kitchen changes over time.

In January, the ingredients were fresh, the chefs were well-rested, and the weather was cold (people eat more soup).
In June, the ingredients are different, the chefs are tired, and it's hot (people eat less soup).

The paper calls this "Time Drift."

If you simply mix the data from Team B's January cooks with Team C's June cooks to compare against Team A, you might get a false result. Maybe Team C's soup looks "worse" not because the recipe is bad, but because it was summer and no one wanted soup.

Concurrent Controls: Team A cooks at the same time as Team B. (Fair comparison).
Non-Concurrent Controls (NCC): Team A cooked in January, but Team C is cooking in June. (Unfair comparison if you just mash the data together).

The Big Question

The researchers asked: Can we use the "old" Team A data (Non-Concurrent) to help us get a more precise answer for the "new" Team C, or will it mess things up?

Many statisticians wanted to say, "Yes! More data is better data!" They wanted to pool all the Team A cooks together to get a super-accurate baseline.

The Solution: A "Causal" Lens

The authors of this paper built a new mathematical framework to answer this. They used a concept called "Causal Survival Analysis."

Think of it like this: Instead of just looking at the average soup taste, they are trying to answer a specific "What If?" question:

"If a patient entered the trial today (concurrent), how long would they survive if they got the new treatment vs. the old treatment?"

They realized that to answer this specific question, you have to be very careful about which "Team A" cooks you use.

The Key Findings (The "Secret Sauce")

1. The "Double-Robust" Chef is the Safest Bet
The paper tested different ways to analyze the data. They found that the most reliable method is a technique called "Doubly Robust Estimation."

Analogy: Imagine you have two ways to guess the soup's quality:
1. Taste Test (Outcome Regression): You guess based on the ingredients used.
2. Chef's Reputation (Propensity): You guess based on who cooked it and when.
The Magic: The "Doubly Robust" method uses both. If your guess about the ingredients is slightly wrong, the "Chef's Reputation" part saves you. If your guess about the chef is wrong, the "Taste Test" saves you.
The Result: This method works best when you only use the "Concurrent" Team A cooks (those who cooked at the exact same time as the new treatment). It gives you a reliable answer without needing to risk mixing in the "old" data.

2. When Pooling Data Backfires
The paper showed that if you try to mix in the "Non-Concurrent" (old) Team A data:

If your math model is perfect: You might get a slightly more precise answer (a sharper picture).
If your math model is even slightly wrong (which happens often in real life): You introduce bias. You might conclude the new soup is great when it's actually terrible, or vice versa, just because you mixed in data from a different season.
The Verdict: It's like trying to fix a blurry photo by adding pixels from a completely different photo. If the lighting is different, you just make the picture worse.

3. The "Time Drift" is Real
The study confirmed that in real-world scenarios (like the COVID-19 trials they analyzed), the "Non-Concurrent" data often carries hidden biases. The "Concurrent" controls are the only ones that truly represent the current reality.

The Bottom Line for Everyone

If you are running a complex experiment where new treatments are added over time:

Don't be greedy with data. Just because you have old control data doesn't mean you should use it.
Focus on the "Now." Compare new treatments only against controls that existed at the same time.
Use the "Double-Check" method. Use advanced statistical tools (Doubly Robust estimators) that protect you if your assumptions about the data are slightly off.

In short: The paper argues that in the race to find better treatments, accuracy is more important than speed. It is better to have a slightly less precise but correct answer using current data, than a very precise but wrong answer by mixing in old, irrelevant data.

Here is a detailed technical summary of the paper "Causal Survival Analysis in Platform Trials with Non-Concurrent Controls" by D'Alessandro, Adhikari, and Santacatterina.

1. Problem Statement

Platform trials are adaptive designs allowing multiple treatment arms to enter and exit over time while sharing a common control arm. This design generates two types of control data:

Concurrent Controls (CC): Subjects randomized to the control arm while the specific treatment arm of interest is active.
Non-Concurrent Controls (NCC): Subjects randomized to the control arm when the treatment of interest was unavailable (e.g., before the arm opened or after it closed).

The Core Challenge:
While pooling NCC with CC is often proposed to increase statistical efficiency (precision), it introduces significant risks:

Temporal Drift: Baseline covariates and event hazards may change over calendar time. Naively pooling NCC can induce bias if the control hazard differs between concurrent and non-concurrent periods.
Ambiguous Estimands: It is often unclear what causal quantity is being estimated when NCC are pooled. Is the target the survival curve of the concurrent population, or a mixture?
Survival Data Complexity: Unlike continuous outcomes, survival data involves censoring and time-varying hazards, making the identification and estimation of causal effects more complex.
Model Dependence: Existing methods often tie the estimand to specific parametric assumptions (e.g., proportional hazards), violating the "estimand-first" framework advocated by regulatory bodies (FDA/ICH).

The paper aims to resolve these issues by developing a rigorous estimand-first causal survival framework for platform trials, specifically addressing time-to-event endpoints.

2. Methodology

A. Causal Framework and Notation

The authors adopt a structural causal model where:

$E$ : Entry time (calendar time).
$W$ : Baseline covariates (which may depend on $E$ ).
$V_a$ : Treatment availability indicator (1 if treatment $a$ is available at entry $E$ ).
$A$ : Randomized treatment assignment.
$T$ : Time-to-event outcome; $C$ : Censoring time.

The target population is the concurrent population for a specific treatment $\tilde{a}$ , defined by the condition $V_{\tilde{a}} = 1$ .

B. Target Estimands

The paper defines the Concurrent Treatment-Specific Counterfactual Survival Curve:
$\theta(a, t) = P(T(a) > t \mid V_{\tilde{a}} = 1)$
where $T(a)$ is the potential event time under treatment $a$ .
From this, the primary estimand is the Difference in Restricted Mean Survival Time (dRMST) up to a horizon $\tau$ :
$\text{dRMST} = \sum_{t=1}^{\tau} [\theta(\tilde{a}, t) - \theta(0, t)]$
This is preferred over Hazard Ratios because it provides a clinically interpretable measure of average survival time and does not rely on the proportional hazards assumption.

C. Identification and Assumptions

The authors establish nonparametric identification results under standard causal assumptions (Exchangeability, Consistency, Random Censoring, Positivity).
Crucially, they introduce Assumption A7 (Pooling Assumption):
$h(m, 0, \tilde{a}, e, w) = h(m, 0, e, w)$
This states that the control hazard in the concurrent population is identical to the control hazard in the pooled (concurrent + non-concurrent) population, conditional on entry time $E$ and covariates $W$ .

If $V_{\tilde{a}}$ is deterministic (e.g., treatment opens at a fixed date), A7 holds automatically on the concurrent domain, and pooling is purely an estimation strategy.
If $V_{\tilde{a}}$ is stochastic, A7 is a substantive assumption requiring that unmeasured temporal factors do not differentially affect the control hazard based on availability.

D. Estimators

The paper proposes and compares two classes of estimators for dRMST:

Outcome Regression (OR): Parametric models for the hazard function.
- $OR_{oc}$ : Uses only concurrent controls.
- $OR_{ac}$ : Uses pooled (concurrent + non-concurrent) controls.
Doubly Robust (DR) Estimators: Based on the Efficient Influence Function (EIF). These combine outcome regression with inverse probability weighting.
- $DR_{oc}$ : Uses only concurrent controls.
- $DR_{ac}$ : Uses pooled controls.

The DR estimators are constructed to be consistent if either the outcome model (hazard) or the nuisance models (propensity/censoring) are correctly specified.

3. Key Contributions

Formal Identification Theory: The paper provides the first rigorous nonparametric identification results for treatment-specific survival curves in platform trials with NCC, explicitly linking the estimand to the observed data distribution.
Clarification of Pooling Validity: It formalizes that pooling NCC improves precision for the concurrent estimand only if:
- Assumption A7 holds (no residual temporal drift in the control hazard).
- The parametric hazard model is correctly specified (for OR estimators).
- If A7 fails or the model is misspecified, pooling induces bias.
Efficiency Analysis:
- For OR estimators, pooling reduces variance only if the model is correct and A7 holds. Otherwise, it trades bias for variance.
- For DR estimators, the paper proves a counter-intuitive result: If treatment availability $V_{\tilde{a}}$ is a deterministic function of entry time $E$ , pooling NCC provides no efficiency gain over using concurrent controls only. Efficiency gains in DR estimators only occur if availability is stochastic and overlaps exist in the covariate space.
Practical Guidelines: The authors recommend targeting concurrent estimands and using covariate-adjusted DR estimators with concurrent controls only as the most robust strategy.

4. Results

Simulation Studies

The authors simulated platform trials with varying proportions of concurrent controls ( $\rho$ ) and model misspecification.

Correct Model Specification: Pooling (OR and DR) reduced variance. However, DR estimators showed substantial overlap in performance between pooled and concurrent-only versions.
Model Misspecification:
- OR Estimators: Pooling NCC led to significant bias and inflated Mean Squared Error (MSE) as the proportion of concurrent controls decreased. Coverage of confidence intervals dropped below nominal levels.
- DR Estimators: Maintained low bias and correct coverage regardless of model misspecification. However, the variance reduction from pooling was negligible compared to the concurrent-only DR estimator.

Application to ACTT (Adaptive COVID-19 Treatment Trial)

The methods were applied to the ACTT-1 and ACTT-2 data (Remdesivir vs. Remdesivir + Baricitinib).

Findings: The covariate-adjusted DR estimator using only concurrent controls ( $DR_{oc}$ ) achieved nearly identical precision (approx. 19% gain over naive) to the pooled estimator ( $DR_{ac}$ , approx. 21% gain).
Conclusion: The precision gain was driven primarily by covariate adjustment, not by the inclusion of non-concurrent controls. The pooled OR estimator showed slight shifts in point estimates, suggesting potential bias.

5. Significance and Implications

Regulatory Alignment: The work supports the FDA/ICH E9(R1) "estimand-first" approach by clearly defining the causal target (concurrent survival) before discussing estimation strategies.
Risk Mitigation: It warns against the "naive" practice of pooling all control data in platform trials. Without strict assumptions (A7) and correct model specification, pooling can invalidate statistical inference.
Optimal Strategy: The paper concludes that the most robust path to improving precision is not to pool non-concurrent controls, but rather to:
1. Target the concurrent causal estimand.
2. Use Doubly Robust estimation.
3. Incorporate strong baseline prognostic covariates to adjust for temporal drift, rather than borrowing data from potentially non-comparable time periods.

In summary, the paper provides a mathematically rigorous framework that cautions researchers against the uncritical use of non-concurrent controls in survival analysis, advocating instead for robust, covariate-adjusted methods that prioritize validity over potential (but risky) efficiency gains.