Proximal Learning for Trials With External Controls: A Case Study in HIV Prevention

Imagine you are a doctor trying to prove that a new, powerful shield (let's call it Shield A) is better at stopping a virus than an old, trusted shield (Shield B).

In the old days, to prove Shield A worked, you would give Shield A to half your patients and give the other half a fake shield (a placebo) made of paper. If the paper-shield group got sick often and the Shield A group didn't, you knew Shield A worked.

But here's the problem: We already know Shield B works very well. It would be unethical to give a patient a piece of paper when we have a proven shield available. So, modern trials give everyone Shield B, and then test Shield A against it.

The Dilemma:
We know Shield A is better than Shield B. But how much better? Is Shield A 10% better? 50% better? To answer that, we need to know: "What would have happened if we had given the patients the paper shield?"

Since we can't ethically use a paper shield, we have to guess what the paper shield group would have looked like. This is where this paper comes in.

The "Time Travel" Detective Work

The authors of this paper are statistical detectives. They want to reconstruct the "missing" paper-shield group using data from a different, older study.

Think of it like this:

Study 1 (The New Trial): Everyone got a real shield. We have 4,500 people. We know exactly how many got sick.
Study 2 (The Old Trial): This study did have a paper-shield group, but it was a few years ago and had different people.

The challenge is that the people in Study 1 and Study 2 are different. Maybe Study 2 had older people, or people from different cities where the virus spreads differently. If you just copy-paste the results from Study 2, your guess will be wrong because the "local virus environment" is different.

The Secret Weapon: "Negative Controls"

This is the clever part. The authors use a concept called Proximal Learning (or "Nearby Learning"). They use two special clues, which they call Negative Controls, to figure out the hidden differences between the two groups.

Imagine you are trying to guess the weather in a city you've never visited (Study 1) by looking at a city you know well (Study 2). You can't just look at the temperature; you need a proxy.

The "Exposure" Clue (Geographic Region):
- The authors look at where the people lived (e.g., Latin America vs. the rest of the world).
- The Logic: Where you live doesn't directly cause you to get HIV, but it strongly correlates with how much the virus is spreading in your neighborhood. It's like looking at the humidity to guess if it will rain, even if humidity doesn't make it rain.
The "Outcome" Clue (Sexually Transmitted Infections - STIs):
- They look at whether people had other infections (like Gonorrhea or Chlamydia) at the start.
- The Logic: Having an STI doesn't cause HIV directly, but it's a sign of high-risk behavior and a high-risk environment. It's like seeing a wet umbrella; the umbrella didn't make it rain, but it tells you it's raining.

The Magic Trick:
By comparing how these two clues (Region and STIs) behave in the "Paper Shield" group of the old study versus the "Real Shield" group of the new study, the authors can mathematically "calibrate" their guess.

They are essentially saying: "We know the new group has more STIs and is in a different region. We can use the old study's data to mathematically adjust for these differences and predict what the new group's 'Paper Shield' results would have been."

The Two Methods Used

The paper proposes two ways to do this math:

The "Weighted" Method (IPCW):
Imagine you have a bag of marbles from the old study. Some marbles look like the people in the new study, some don't. This method gives "extra weight" to the marbles that look like the new study and "less weight" to the ones that don't. It's like creating a custom-made "virtual control group" that perfectly mimics the new study's demographics.
The "Two-Stage" Method:
This is a step-by-step approach.
- Step 1: Use the old study to figure out the relationship between the clues (STIs/Region) and the virus.
- Step 2: Apply that relationship to the new study to predict the outcome.
- Why two stages? Because HIV is rare (like finding a needle in a haystack). Standard math often fails with rare events. This method is specifically tuned to handle "needle-in-a-haystack" scenarios without getting confused.

The Results: What Did They Find?

They applied this to the HPTN 083 trial (the Cabotegravir study).

The Reality: The new drug (Cabotegravir) was already known to be better than the old drug (TDF/FTC).
The Prediction: Using their new math, they estimated that if the new drug group had been given a "paper shield," about 4.3% to 5.5% of them would have gotten HIV.
The Comparison:
- Actual new drug group: 0.4% got HIV.
- Predicted paper shield group: ~5% got HIV.

Conclusion: The new drug is incredibly effective. It reduced the risk by about 92% compared to doing nothing (the paper shield).

Why This Matters

This paper is a game-changer because it solves a major ethical and scientific problem.

Ethics: We don't have to trick patients into taking a fake treatment to prove a new one works.
Science: We can still get the "gold standard" proof (comparing to a placebo) by using smart math to reconstruct the missing data from the past.

In a nutshell: The authors built a statistical "time machine" using clues about geography and other infections to reconstruct a fair comparison, proving that the new HIV prevention drug is a superhero, even without a fake shield to compare it against.

Here is a detailed technical summary of the paper "Proximal Learning for Trials With External Controls: A Case Study in HIV Prevention."

1. Problem Statement

Context: Active-controlled HIV prevention trials (e.g., comparing a new drug to an existing standard of care like TDF/FTC) have become common due to the proven efficacy of existing Pre-Exposure Prophylaxis (PrEP) agents. However, ethical and practical constraints often prevent the inclusion of a concurrent placebo arm.
Challenge: To determine the absolute efficacy of a new intervention, researchers must estimate the counterfactual HIV incidence under placebo ( $P(T^*(0) \le t | R=0)$ ) for the primary trial population.
Specific Obstacles:

Unmeasured Confounding: Primary trials (e.g., HPTN 083) and external control trials (e.g., AMP study) often recruit from different geographic regions with distinct local HIV transmission dynamics (e.g., viral load, sexual network density). These factors are rarely measured but significantly influence HIV risk, violating the "no unmeasured confounding" assumption required by standard causal inference methods (like propensity scoring).
Low Event Rates: HIV incidence in modern prevention trials is very low (often <1-3%). This leads to statistical instability, wide confidence intervals, and potential estimates falling outside the plausible [0, 1] range when using standard methods.
Data Heterogeneity: Differences in baseline characteristics (demographics, STI prevalence) between the primary and external datasets require robust adjustment beyond simple covariate matching.

2. Methodology

The authors propose a Proximal Causal Inference framework to address unmeasured confounding using Negative Control Outcomes (NCOs) and Negative Control Exposures (NCEs).

2.1. Core Framework

Unmeasured Confounder ( $U$ ): Local HIV transmission environment risk.
Negative Control Outcome ( $W$ ): Baseline Sexually Transmitted Infection (STI) status (rectal gonorrhea/chlamydia).
- Rationale: STIs share behavioral/biological risk factors with HIV (correlating with $U$ ) but do not causally affect trial participation ( $R$ ) or the future HIV outcome directly.
Negative Control Exposure ( $Z$ ): Geographic region (e.g., Latin America vs. Non-Latin America).
- Rationale: Region correlates with local transmission risk ( $U$ ) but does not directly influence individual HIV risk independent of $U$ and observed covariates ( $X$ ).

The authors develop two distinct estimators to handle time-to-event data with right-censoring:

2.2. Method 1: Semiparametric Inverse Probability of Censoring Weighting (IPCW)

Basis: Extends the work of Cui et al. (2024) and Ying et al. (2022) to handle right-censored data.
Mechanism: Uses bridge functions ( $h$ $h$ and $q$ $q$ ) to solve integral equations that link the observed data to the counterfactual outcome.
- $h(W, X)$ : Outcome bridge function.
- $q(Z, X)$ : Treatment bridge function.
Estimators:
1. Outcome Bridge: Estimates based on $E[h(W, X) | R=0]$ .
2. Treatment Bridge: Estimates based on weighting external data by $q(Z, X)$ .
3. Doubly Robust: Combines both bridge functions for robustness against model misspecification.
Handling Censoring: Incorporates an IPCW term to adjust for administrative censoring and loss to follow-up, assuming censoring is random conditional on $X, Z, R$ .
Assumptions: Requires Positivity (overlap between datasets) and Completeness (the NCO/NCE must be sufficiently informative about $U$ ).

2.3. Method 2: Two-Stage Regression-Based Estimator

Basis: Tailored specifically for low-incidence settings where the Cox proportional hazards model's non-collapsibility is problematic.
Key Assumption (Rare Event): For a pre-specified time $t_0$ (e.g., 1 year), the probability of remaining event-free is near 1 ( $P(T^* > t_0) \approx 1$ ). This allows the hazard function to be approximated by the density function, which is collapsible.
Mechanism:
1. Stage 1: Model the NCO ( $W$ ) using a log-linear model to estimate $E(W | R, Z, X)$ .
2. Stage 2: Fit a Cox model for the counterfactual event time $T^*(0)$ in the external data ( $R=1$ ), using $X$ and $\log(E(W|R, Z, X))$ as covariates.
3. Prediction: Apply the estimated parameters to the primary trial data ( $R=0$ ) to predict the counterfactual hazard and cumulative incidence.
Advantage: Does not require the strict completeness assumption of the IPCW method and yields more precise estimates in rare-event scenarios.

3. Case Study Application

Primary Trial: HPTN 083 (Cabotegravir vs. TDF/FTC).
External Control: Placebo arm of the AMP study (HVTN 704/HPTN 085).
Data Processing:
- Harmonized covariates ( $X$ ): Age, Race, Gender.
- NCO ( $W$ ): Combined baseline rectal gonorrhea or chlamydia (selected after testing validity; chlamydia alone was a weak proxy).
- NCE ( $Z$ ): Geographic region (Latin America vs. Non-Latin America).
Analysis: Both methods were applied to estimate the 1-year cumulative HIV incidence under placebo for the HPTN 083 population.

4. Key Results

Counterfactual Incidence Estimates:
- Both proximal inference methods yielded consistent estimates for the counterfactual 1-year HIV incidence under placebo: 4.3% to 5.5%.
- In contrast, "naïve" regression models (adjusting only for observed covariates) underestimated the incidence (2.9%–3.0%), indicating residual bias from unmeasured confounding.
Absolute Efficacy:
- Cabotegravir: Showed statistically significant superiority over the estimated placebo (Relative Efficacy $\approx$ 92%, $p < 0.001$ ).
- TDF/FTC: Showed significant superiority over the estimated placebo (Relative Efficacy $\approx$ 72–78%, $p < 0.05$ ).
Statistical Performance:
- Simulation studies confirmed that both methods are unbiased, whereas naïve methods are biased.
- The Two-Stage Regression method generally produced narrower confidence intervals (higher precision) than the IPCW method in low-event settings.
- The IPCW method provided valid coverage but occasionally produced estimates outside the [0, 1] range, a known limitation.

5. Key Contributions

Methodological Extension: Successfully extended proximal causal inference to time-to-event data with right-censoring, a gap in existing literature which primarily focused on binary or continuous outcomes.
Dual-Approach Framework: Developed two complementary estimators:
- A semiparametric IPCW approach for robustness with fewer modeling assumptions.
- A regression-based two-stage approach optimized for low-incidence outcomes, offering higher efficiency.
Practical Application: Demonstrated the feasibility of estimating absolute efficacy in active-controlled HIV trials without a concurrent placebo arm, providing a solution to a critical ethical and design challenge in modern clinical research.
Validation of Negative Controls: Empirically validated the use of baseline STIs and geographic region as effective negative controls for unmeasured HIV transmission risk.

6. Significance

This paper provides a rigorous statistical framework for data integration in clinical trials where placebo arms are unethical or infeasible. By leveraging external controls and proximal inference, researchers can:

Obtain absolute efficacy estimates (vs. relative efficacy only), which is crucial for patient counseling and public health policy.
Address unmeasured confounding arising from geographic and behavioral heterogeneity between trials.
Overcome statistical challenges posed by rare events in modern prevention trials.

The findings suggest that proximal learning is a viable and robust alternative for evaluating new HIV prevention agents, potentially reducing the need for large, long-duration placebo-controlled trials while maintaining scientific rigor.