TEA-Time: Transporting Effects Across Time

This paper introduces the TEA-Time framework for extrapolating treatment effects across different time periods by proposing two identification strategies with doubly robust estimators, which are validated through simulations and applied to Upworthy A/B tests to demonstrate a trade-off between precision and bias.

Harsh Parikh, Gabriel Levin-Konigsberg, Dominique Perrault-Joncas, Alexander Volfovsky

Published Tue, 10 Ma
📖 6 min read🧠 Deep dive

Imagine you are a chef who just discovered a secret sauce that makes your burgers taste amazing. You tested it in July, and the results were fantastic. But now, it's December, and you want to know: Will this same sauce still make the burgers taste great during the holidays?

You can't just guess. You can't run a new experiment right now because you need to make a decision for the holiday menu today. You only have data from July.

This is the exact problem the paper "TEA-Time: Transporting Effects Across Time" tries to solve. The authors are statisticians who want to help businesses and scientists take a result from one time period and predict what would happen if they did the same thing at a different time.

Here is the breakdown of their solution, using simple analogies.

The Core Problem: Time Changes Everything

In the world of science and business, we often run "experiments" (like A/B tests).

  • The Old Way: We assume that if a job training program worked in 2020, it will work exactly the same way in 2024.
  • The Reality: Time changes things. A summer marketing campaign works differently than a winter one. A drug might work better in winter when flu season is high, but less in summer.

The authors call this "Temporal Transportation." They want to "transport" the results of a past experiment to a future (or past) time where we didn't run the experiment.

The Big Idea: Using "Anchors"

Since we can't go back in time to run the exact same test, we need a bridge. The authors call this bridge an "Anchor."

Imagine you want to know how much a specific car (let's call it Car A) costs in 2024, but you only know its price in 2020. You can't just guess.

  • The Trick: You look at Car B and Car C. You know the price of Car B in 2020 and 2024. You also know the price of Car C in 2020 and 2024.
  • The Logic: If Car B doubled in price from 2020 to 2024, and Car C also doubled, it's likely that Car A doubled too. You use the other cars as "anchors" to figure out how the market changed over time, and then apply that change to Car A.

The paper proposes two specific ways to find these "anchors."

Strategy 1: The "Exact Clone" Approach (Replicated Trials)

This is the most reliable but hardest method.

  • How it works: You need to find a situation where you tested the exact same thing at two different times.
    • Example: You tested "Sauce A vs. Sauce B" in July. You also tested "Sauce A vs. Sauce B" in December.
  • The Math: You compare the results. If the difference between Sauce A and B was huge in July but tiny in December, you know "Time" changed the effect. You use that ratio to adjust your main prediction.
  • Pros: Very accurate. It accounts for complex changes (like how the time between cooking and eating matters).
  • Cons: It's rare. Companies rarely run the exact same A/B test twice at different times.

Strategy 2: The "Common Thread" Approach (Common Arm)

This is the practical, "good enough" method that works more often.

  • How it works: You don't need the exact same test. You just need one common ingredient that appears in many different tests over time.
    • Example: You want to know how "Sauce A" performs in December. You don't have a December test for Sauce A. But, you do have a "Control Group" (no sauce) that was used in tests in July, August, September, and December.
  • The Logic: You assume the "Control Group" (the no-sauce burgers) behaves consistently over time. If the "No Sauce" group gets 10% more clicks in December than in July, you assume everything gets 10% more clicks in December. You use that "No Sauce" trend to adjust your prediction for "Sauce A."
  • Pros: Very easy to do. Most companies have a "Control" group running constantly.
  • Cons: It makes a strong assumption: that time affects everything equally. If "Sauce A" interacts weirdly with the holidays (e.g., people hate spicy food in winter), this method might give a biased answer.

The Trade-Off: Precision vs. Accuracy

The authors tested these methods using simulations and real data from Upworthy (a website that tests thousands of headlines).

  • The "Common Thread" (Strategy 2) gave very precise answers (tight confidence intervals), but sometimes they were wrong (biased). It was like a GPS that gives you a very confident route, but it's the wrong route because it didn't account for a specific road closure.
  • The "Exact Clone" (Strategy 1) was slower and had wider margins of error (less precise), but it was more accurate. It tracked the real changes in the data better.

The "Secret Sauce" of the Paper

The authors didn't just say "use Strategy 1 or 2." They built a mathematical toolkit (called "Doubly Robust Estimators") that:

  1. Combines the best of both worlds: If your data is messy, the math automatically adjusts to be more robust.
  2. Gives you a warning system: If you try both strategies and they give very different answers, the math tells you: "Hey, something is weird. The effect of time might be changing in a complex way. Be careful!"

Why This Matters

In our fast-paced world, businesses run experiments every day.

  • A bank tests a new loan offer in January.
  • A streaming service tests a new thumbnail in March.
  • A hospital tests a new drug protocol in June.

They can't wait to run the test in December to see if it works in December. They need to know now. This paper gives them a principled, scientific way to say, "Based on what we saw in January, here is our best guess for December," while admitting the uncertainty and checking for hidden traps.

In short: It's a guide on how to time-travel your data without actually traveling through time, using other experiments as your compass.