📊 epidemiology

An E-value-Informed Sensitivity Analysis Framework for Hybrid Controlled Trials

This paper proposes an E-value-informed sensitivity analysis framework with a data-driven benchmark and operational decision rule to assess and safeguard the validity of hybrid controlled trials against unmeasured confounding, thereby enabling robust inference while preserving the statistical power gains from incorporating real-world data.

Original authors: Liu, C., Mayer, M., Lactaoen, K., Gomez, L., Weissman, G., Hubbard, R.

Published 2026-03-06

📖 5 min read🧠 Deep dive

CC BY 4.0

Original authors: Liu, C., Mayer, M., Lactaoen, K., Gomez, L., Weissman, G., Hubbard, R.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Idea: Testing a New Drug with a "Shadow" Control Group

Imagine you are a chef testing a new, fancy recipe (the Experimental Treatment). To see if it's actually good, you need to compare it to a standard, old recipe (the Control).

In a perfect world, you would cook the new recipe for 100 people and the old recipe for another 100 people, flipping a coin to decide who gets which. This is a Randomized Controlled Trial (RCT). It's the "gold standard" because the coin flip ensures the two groups are identical in every way except for the recipe.

The Problem:
Sometimes, finding 100 people to eat the "old recipe" is hard, expensive, or unethical (maybe the old recipe is known to be bad). So, researchers come up with a clever shortcut: Hybrid Controlled Trials (HCTs).

Instead of finding 100 new people for the control group, they say: "Let's use the 100 people we have for the new recipe, but let's also grab data from 500 people who ate the old recipe in regular hospitals (Real-World Data)."

This makes the study faster and gives more people access to the new, potentially better recipe. But there's a catch.

The Danger: The "Ghost" Variable

When you flip a coin, you ensure the groups are fair. But when you grab data from regular hospitals, you aren't flipping a coin.

Imagine the people in the hospital data are different from your trial participants. Maybe the hospital patients are sicker, older, or have different diets. These differences are Unmeasured Confounders. They are like a Ghost that you can't see, but it's messing up your results.

If the hospital patients were already sicker to begin with, the new recipe might look amazing just because the comparison group was weak, not because the new recipe is actually good. This is called Bias.

The Solution: The "Tipping Point" Test

The authors of this paper created a new tool to check if the results are real or just an illusion caused by that "Ghost." They call it an E-value-Informed Sensitivity Analysis.

Think of it like a Structural Integrity Test for a bridge.

The Bridge: Your study result (e.g., "The new drug works!").
The Wind: The unmeasured confounding (the Ghost).

The researchers ask: "How strong does the wind (the Ghost) have to be to blow this bridge down?"

They developed two specific numbers to answer this:

1. The HC-Value (The "Bridge Strength" Number)

This number tells you how strong the "Ghost" would need to be to completely destroy your result.

High HC-Value: The bridge is strong. You would need a hurricane (a massive, impossible Ghost) to knock the result down. This means your result is Robust (likely real).
Low HC-Value: The bridge is weak. A gentle breeze (a tiny, plausible Ghost) could knock it down. This means your result is Fragile (likely fake).

2. The RD-Value (The "Wind Gauge")

This is the clever part. The researchers look at the data they already have to see how "windy" it actually is. They compare the hospital patients to the trial patients who ate the same old recipe.

If the hospital patients did much worse than the trial patients, the "Wind Gauge" (RD-Value) is high. It means there is a lot of difference between the groups.
This acts as a Benchmark. It tells you: "Based on the data we see, the Ghost is this strong."

The Decision Rule: The "Tug-of-War"

Now, you compare the two numbers. Imagine a tug-of-war:

Team A (The Result): Represented by the HC-Value (How hard it is to break the result).
Team B (The Reality): Represented by the RD-Value (How strong the actual differences in the data are).

The Rule:

If Team A wins (HC-Value > RD-Value): The result is stronger than the differences in the data. You can trust the result! The new drug probably works.
If Team B wins (RD-Value > HC-Value): The differences in the data are strong enough to explain away the result. The "Ghost" is too strong. You cannot trust the result; it might just be a fluke.

The Real-World Test: The Asthma Study

The authors tested this on a real asthma drug study.

Scenario A (Medium Dose Drug): The study said the drug worked. But when they ran their "Tug-of-War," the RD-Value (the differences in the data) was stronger than the HC-Value.
- Verdict: The result was not robust. The drug might not work; the hospital data was just too different from the trial data.
Scenario B (High Dose Drug): The study said the drug worked. The HC-Value was huge (very strong bridge), and the RD-Value was small (weak wind).
- Verdict: The result was robust. The drug likely works, and the extra data helped prove it.

Why This Matters

Before this paper, researchers had to guess if their "Hybrid" trials were fair. They might have been fooled by a "Ghost" they couldn't see.

This new framework gives them a calculator. It allows them to say: "We used real-world data to speed things up, but we checked the math, and the results are still solid."

It's like adding a safety net to a trapeze act. You can fly higher (get more data, faster results), but you have a safety check to make sure you don't fall if the wind gets too strong.

Summary in One Sentence

This paper gives scientists a simple way to check if their "shortcut" studies (using real-world data) are actually telling the truth, by measuring if the differences in the data are strong enough to fake a result.

1. Problem Statement

Hybrid Controlled Trials (HCTs) are an innovative clinical trial design that augments the internal control arm of a Randomized Controlled Trial (RCT) with external real-world data (RWD) from patients receiving standard care. While HCTs offer significant benefits—such as increased statistical power, reduced sample size requirements, and improved patient access to experimental treatments—they face a critical validity threat: unmeasured confounding.

Unlike standard RCTs where randomization ensures exchangeability, HCTs rely on external controls that were not randomized. Systematic differences in unmeasured characteristics between RCT participants and external controls can lead to outcome non-exchangeability, distorting the estimated treatment effect. Existing sensitivity analysis methods for HCTs often rely on the Residual Difference (RD) (the outcome difference between internal and external controls after adjusting for measured covariates) as an indirect proxy for bias. However, these methods are often:

Technically complex and difficult to implement.
Limited to specific outcome types (e.g., continuous or time-to-event).
Indirect in their interpretation, making it hard to intuitively assess the plausibility of a hypothetical unmeasured confounder.
Lacking a direct link between the observed non-exchangeability and the robustness of the treatment effect estimate.

2. Methodology

The authors propose a novel sensitivity analysis framework that adapts the E-value concept (originally designed for observational studies) to the HCT setting. The framework introduces two complementary metrics: the HC-value and the RD-value.

2.1 Notation and Causal Structure

Variables: $A$ (Treatment), $Y$ (Outcome), $S$ (Trial inclusion: 1=RCT, 0=External), $X$ (Measured confounders), $U$ (Unmeasured confounder).
Key Distinction: In HCTs, $U$ affects treatment assignment ( $A$ ) only indirectly through trial inclusion ( $S$ ), as treatment is randomized within the trial ( $S=1$ ). This differs from standard observational studies where $U$ directly affects $A$ .
Parameters:
- $RR_{SU}$ : Risk ratio of $U$ between trial participants and external controls (imbalance of $U$ ).
- $RR_{UY}$ : Risk ratio of the effect of $U$ on the outcome $Y$ .
- $RD$: The risk ratio of the outcome comparing external vs. internal controls ( $P(Y=1|A=0, S=0) / P(Y=1|A=0, S=1)$ ).
- $BF$ (Bias Factor): The ratio of the HCT treatment effect to the true trial treatment effect.

2.2 The HC-value (Hybrid Controlled Value)

The HC-value is an adaptation of the E-value for HCTs. It quantifies the minimum strength of association ( $\max(RR_{SU}, RR_{UY})$ ) that an unmeasured confounder would need to have with either trial inclusion or the outcome to fully explain away the observed HCT treatment effect (i.e., to shift the estimate to the null).

Formula: Derived from the upper bound of the bias factor equation:
$\text{HC-value} = \frac{\eta \cdot RR_{AY}^{HC} + \sqrt{RR_{AY}^{HC}(RR_{AY}^{HC}-1)\eta(\eta+1)}}{1 - RR_{AY}^{HC} + \eta}$
Where $\eta$ is the ratio of external to internal control sample sizes, and $RR_{AY}^{HC}$ is the estimated HCT treatment effect.
Interpretation: A large HC-value implies the result is robust (requires a very strong confounder to negate the effect); a small HC-value implies fragility.

2.3 The RD-value (Residual Difference Value)

The RD-value serves as a data-driven benchmark. It represents the minimum strength of association ( $\max(RR_{SU}, RR_{UY})$ ) required to induce the observed Residual Difference (RD) between the internal and external control arms.

Formula:
$\text{RD-value} = RD + \sqrt{RD(RD-1)}$
Interpretation: It quantifies the magnitude of unmeasured confounding actually present in the data based on the observed non-exchangeability of the control arms.

2.4 Decision Rule

The framework proposes a practical decision rule to assess robustness:

Calculate the HC-value for the point estimate (or the confidence interval limit closer to the null).
Calculate the RD-value for the observed RD.
Decision: If $\text{RD-value} < \text{HC-value}$ $RD-value < HC-value$ , the observed unmeasured confounding is insufficient to explain the treatment effect (or its significance). The null hypothesis is rejected (result is robust).
- If $\text{RD-value} \ge \text{HC-value}$ , the observed non-exchangeability could plausibly explain the entire effect, and the result is not considered robust.

3. Key Contributions

Novel Framework: First to adapt the E-value framework specifically for HCTs, addressing the unique causal structure where confounding acts through trial inclusion rather than direct treatment assignment.
Interpretability: Replaces complex, indirect sensitivity parameters with two intuitive, comparable metrics (HC-value and RD-value) on the same scale.
Data-Driven Benchmark: Introduces the RD-value, which uses the observed data to set a realistic baseline for the magnitude of confounding, avoiding purely hypothetical assumptions.
Operational Decision Rule: Provides a clear, actionable rule for researchers and regulators to determine if HCT findings are robust enough for decision-making.
Generalizability: The framework is applicable to various outcome types (binary, time-to-event, count) by computing approximate risk ratios.

4. Results

4.1 Simulation Studies

The authors conducted extensive simulations (5,000 replications) varying the strength of unmeasured confounding ( $RR_{SU}$ ), the extent of data borrowing ( $\eta$ ), and the direction of outcome differences.

Type I Error Control: The decision rule based on the HC-value of the confidence interval limit closer to the null successfully controlled Type I error near the nominal 5% level, even under moderate to strong unmeasured confounding. This was superior to standard HCT analysis (which showed inflated Type I error) and comparable to "Trial Only" analysis.
Power Preservation: While controlling Type I error, the framework preserved the power gains of HCTs. In scenarios with moderate confounding where external controls had poorer outcomes, power increased by 10–20% compared to using RCT data alone.
Comparison: The rule based on the point estimate was less conservative (higher power, slightly higher Type I error), while the rule based on the CI limit was more conservative (strict Type I error control).

4.2 Application: Asthma HCT

The framework was applied to an asthma trial (IRIDIUM) augmented with Electronic Health Record (EHR) data from Penn Medicine.

Scenario 1 (Medium-dose treatment): The HCT showed a significant treatment effect ($RR=0.81$), but the trial-only analysis did not ($RR=0.92$).
- Result: The RD-value (1.86) was larger than the HC-value of the CI limit (1.52).
- Conclusion: The observed unmeasured confounding could plausibly explain the statistical significance. The result was deemed non-robust.
Scenario 2 (High-dose treatment): The HCT showed a strong significant effect ($RR=0.65$), consistent with the trial-only analysis ($RR=0.73$).
- Result: The RD-value (1.86) was much smaller than the HC-value of the CI limit (3.20).
- Conclusion: The observed confounding was insufficient to explain the effect. The result was deemed robust.

5. Significance

This paper provides a critical tool for the regulatory and clinical adoption of Hybrid Controlled Trials.

Regulatory Relevance: It offers a transparent, interpretable method to assess the reliability of HCT findings, addressing a major concern for regulators (e.g., FDA) regarding bias from external controls.
Practical Utility: By quantifying the "tolerable" level of confounding against the "observed" level, it helps researchers decide whether to proceed with an HCT or if the results are too fragile for clinical decision-making.
Efficiency vs. Validity: It demonstrates that HCTs can achieve significant power gains without sacrificing validity, provided that a rigorous sensitivity analysis confirms the robustness of the results against unmeasured confounding.

In summary, the proposed framework bridges the gap between the efficiency of HCTs and the rigorous causal inference required for high-stakes medical decisions, ensuring that the integration of real-world data does not compromise the integrity of clinical trial conclusions.