Data Fusion with Distributional Equivalence Test-then-pool

This paper proposes a novel test-then-pool framework that leverages kernel-based distributional equivalence testing and resampling methods to safely fuse historical and concurrent control data in randomized controlled trials, thereby improving statistical power while rigorously controlling Type-I error rates.

Linying Yang, Xing Liu, Robin J. Evans

Published Fri, 13 Ma
📖 5 min read🧠 Deep dive

Imagine you are a doctor trying to figure out if a new medicine works. The gold standard is a Randomized Controlled Trial (RCT): you give the medicine to one group of people (the Treatment Group) and a sugar pill to another (the Control Group), then compare the results.

But here's the problem: finding people to take the sugar pill is hard, expensive, and sometimes unethical. You might only have 50 people in your control group, but 200 in the treatment group. This makes your results "wobbly" and less reliable.

The Temptation: You look at your computer and see data from a previous trial where 100 people took the same sugar pill. "Why not just mix them together?" you think. "That gives me 150 control people! My results will be much stronger!"

The Danger: But wait. The people in the old trial might be different. Maybe they were older, lived in a different country, or were measured with different tools. If you blindly mix them, you might introduce bias. It's like trying to compare the speed of a Ferrari to a bicycle, but then adding a picture of a horse to the "bicycle" group. You'll get a confusing, wrong answer.

The Old Solution: "Test, then Pool"

Scientists have tried to solve this with a method called Test-then-Pool (TTP).

  1. Test: They check if the old data and new data look "similar."
  2. Pool: If they look similar, they mix them. If not, they keep them separate.

The Flaw: The old way of testing was too simple. It mostly checked if the average results were the same. But two groups can have the same average but very different shapes.

  • Analogy: Imagine two classes of students. Class A has scores of 50, 50, 50, 50, 50. Class B has scores of 0, 0, 0, 0, 250. Both have an average of 50. If you only check the average, you think they are the same. But Class B is wild and unpredictable, while Class A is steady. Mixing them would ruin your analysis.

The New Solution: "Distributional Equivalence Test-then-Pool"

The authors of this paper (Yang, Liu, and Evans) invented a smarter, more rigorous way to decide whether to mix the data. Think of it as a High-Tech Data Matchmaker.

Here is how their new method works, step-by-step:

1. The "Full-Body Scan" (Distributional Testing)

Instead of just checking the average (the "head"), they scan the entire body of the data. They use a mathematical tool called MMD (Maximum Mean Discrepancy).

  • Analogy: Imagine you are trying to match two fingerprints. The old method just checked if the ridges were the same height. The new method looks at the entire pattern, the swirls, the loops, and the tiny details. It asks: "Is the whole shape of this group of people the same as that group?"

2. The "Equivalence" Test (The Safety Margin)

This is the cleverest part. The old method asked: "Are they exactly identical?" (Which is impossible in real life).
The new method asks: "Are they close enough to be considered twins?"

  • They set a tolerance radius (let's call it θ\theta).
  • If the difference between the old and new groups is smaller than this radius, they say, "Okay, these are close enough. We can mix them."
  • If the difference is larger, they say, "No way, they are too different. Keep them separate."
  • Why this matters: This prevents the "Type-I Error" (false alarms). It guarantees that even if you mix them, you haven't introduced a dangerous bias that would make your medicine look fake or real when it isn't.

3. The "Partial" Safety Net (Bootstrap & Permutation)

Once they decide to mix the data, they need to run the final test to see if the medicine works. But because the groups were only "close enough" (not identical), standard math tricks don't work perfectly.

  • Analogy: Imagine you are weighing a package. Usually, you put it on a scale. But if the scale is slightly wobbly (because you mixed two slightly different groups), you can't trust the reading.
  • The Fix: The authors invented "Partial Bootstrap" and "Partial Permutation."
    • Imagine you have a bag of marbles. To check if your scale is accurate, you take out some marbles, weigh them, put them back, and do it 1,000 times to see how much the weight usually wobbles.
    • Their "Partial" method is smart: it simulates the wobble exactly as it would happen with the mixed groups, ensuring the final result is statistically valid, even if the groups weren't perfect twins.

Why This is a Big Deal

  1. It's Safer: It rigorously controls the risk of making a wrong conclusion (Type-I error).
  2. It's Smarter: It catches differences that simple averages miss (like the wild vs. steady student example).
  3. It's Powerful: By safely using historical data, researchers can run smaller, cheaper, and faster trials without sacrificing accuracy.

The Real-World Test

The authors tested this on the Prospera program in Mexico (a famous study on cash transfers for school attendance).

  • They took a small slice of the current data and tried to mix it with old data.
  • Result: Their new method found the program worked much more clearly (higher power) than the old methods, while still keeping the error rate low. It proved that you can safely "borrow" from the past to understand the future, as long as you use the right "matchmaking" rules.

In a nutshell: This paper gives scientists a new, super-secure way to combine old and new data. It's like having a strict but fair referee who says, "You can use the old team's stats, but only if they are truly similar enough, and we'll double-check the math to make sure no one cheats."