Efficient estimation of cumulative incidence curves via data fusion with surrogates: application to integrated analysis of vaccine trial and immunobridging data

This paper develops efficient and multiply robust statistical methods for estimating cumulative incidence curves by fusing participant-level data from historical vaccine efficacy trials and immunobridging studies, with applications demonstrated on multi-serotype pathogens and COVID-19 booster data.

Pan Zhao, Peter B. Gilbert, Oliver Dukes, Bo Zhang

Published 2026-04-16
📖 5 min read🧠 Deep dive

Imagine you are a chef who invented a delicious new soup (a vaccine) that cured a specific type of cold (a virus). You tested it on thousands of people in a big kitchen (a Phase 3 clinical trial) and proved it works. You know exactly how the soup tastes and how it helps people.

Now, a new strain of cold virus is spreading. You want to update your soup recipe to fight this new strain. But you can't just run a massive, expensive, years-long trial on thousands of people again. That would take too long and cost too much money.

Instead, you want to use a shortcut: Immunobridging.

This paper is about a sophisticated statistical "recipe" that allows scientists to predict how well a new, updated soup will work, by mixing two different bowls of data together.

The Two Bowls of Data

  1. The Big Bowl (The Historical Trial): This contains data from the original, massive trial. It has thousands of people. We know what they ate (the vaccine), how their immune system reacted (the "immune marker," like a antibody level), and whether they got sick later.
  2. The Small Bowl (The Immunobridging Study): This is a smaller, newer study with fewer people. They took the new soup recipe. We know how their immune system reacted to the new soup, but we don't know if they got sick yet because the study is too short or too small to wait for that.

The Goal: We want to predict the "sickness curve" (how many people get sick over time) for the people in the Small Bowl, using the Big Bowl as a guide.

The Problem: It's Not Just About the Soup

If the only thing that mattered was the soup, we could just say, "If the immune reaction is the same, the protection is the same." But life is messy.

  • Different People: The people in the Big Bowl might be older or have different health histories than the people in the Small Bowl.
  • Different Viruses: The virus in the Big Bowl might be slightly different from the one in the Small Bowl.
  • The "Hidden" Factor: The immune marker (like an antibody level) is a surrogate. It's a stand-in for the real thing. It's like looking at a car's speedometer to guess how far you've traveled. Usually, it works. But sometimes, the speedometer is right, but the engine is sputtering, or the road is icy.

The authors realized that simply comparing the speedometers (antibody levels) isn't enough. You have to account for the driver (the person's baseline health) and the road conditions (the virus strain).

The Solution: The "Time-Travel" Calculator

The authors developed a mathematical method that acts like a time-travel calculator. It asks a "What if?" question:

"If the people in the Small Bowl (who got the new soup) had been in the Big Bowl (where we know who got sick), how would their sickness curve look?"

To do this, they use a technique called Data Fusion. They stitch the two bowls together using three clever rules (assumptions):

  1. The "Same Driver" Rule: If two people have the same health history and the same antibody level, they should have the same risk of getting sick, regardless of which study they were in.
  2. The "No Magic" Rule: The new soup doesn't have any secret superpowers that the old soup doesn't have, other than what is shown by the antibody level. If the antibody levels are the same, the protection should be the same. (If the new soup had a secret "super-ingredient" that the antibody test couldn't see, this rule would break).
  3. The "Bridge" Rule: We can mathematically translate the risk from the Big Bowl to the Small Bowl by adjusting for the differences in the people and the virus.

The Result: A Crystal Ball for Vaccines

By using this method, the authors can draw a Cumulative Incidence Curve. Think of this as a weather forecast for getting sick.

  • Without this method: We would have to wait years to see if the new vaccine works.
  • With this method: We can look at the antibody levels from the small study, mix them with the "sickness history" from the big study, and instantly generate a prediction: "Based on this data, about 5% of people might get sick in 3 months, and 7% in 6 months."

Real-World Example: The COVAIL Trial

The paper tested this on real data from the COVAIL trial (a study on COVID-19 boosters).

  • They had data from an old trial with the original virus.
  • They had a small new study with a "bivalent" booster (designed for new variants).
  • They used their method to predict how well the new booster would work against the new variant.

They even used the method to check their own rules. They asked: "Does the new vaccine have any secret superpowers the antibody test missed?" By comparing their prediction to the actual results (when they finally got them), they found that the antibody test did miss some protection. This proved that their method is smart enough to tell us when our assumptions are wrong!

Why This Matters

This paper is like giving regulators (the FDA) and scientists a super-powered telescope.

  • Speed: We don't have to wait years to approve new vaccines for new variants.
  • Safety: We can approve vaccines faster, saving lives during outbreaks.
  • Efficiency: We don't need to run massive, expensive trials for every single update.

In short, the authors built a bridge between the past (what we know worked) and the future (what we need to know will work), allowing us to cross over with confidence, even when we don't have all the data yet.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →