Prediction decomposition for causal analysis

This paper proposes a prediction decomposition framework that identifies within-unit-across-time prediction accuracy as a superior structural proxy for counterfactual treatment effects, offering a new diagnostic metric and model-selection strategy to improve causal inference using machine learning predictions.

Original authors: Ofir Reich

Published 2026-04-14✓ Author reviewed
📖 6 min read🧠 Deep dive

This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Problem: "The Good Predictor is a Bad Detective"

Imagine you are a detective trying to solve a mystery: Did a new fertilizer make crops grow taller?

You have a huge field of corn, but you can't measure every single plant (it's too expensive). So, you measure a small sample of plants directly. Then, you train a super-smart AI (Machine Learning) to look at satellite photos and guess the height of the other plants based on the ones you measured.

The Trap:
The AI becomes a fantastic predictor of how tall a plant is. It looks at the soil, the weather history, and the location, and it says, "This plant is 5 feet tall, and that one is 4 feet tall." It gets an A+ on its test scores.

But here is the catch: The AI is terrible at being a detective.

If you give the AI the data to see if the fertilizer worked, it might say, "No change!" Why? Because the AI learned that plants in "Rich Soil" are naturally tall, and plants in "Poor Soil" are naturally short. It learned to predict static differences (who is tall vs. who is short). It didn't learn to predict dynamic changes (what happens when you add fertilizer).

So, even though the AI is great at guessing heights, it completely misses the effect of the fertilizer.

The Paper's Solution: Breaking the Prediction into Three Parts

The author, Ofir Reich, proposes a way to break down the AI's "brain" into three distinct ingredients. Think of the AI's prediction as a smoothie made of three fruits:

  1. The "Who You Are" Fruit (Between-Unit): This is the static stuff. Where you live, your family history, your soil type. The AI is usually very good at this. It knows that a person in a wealthy neighborhood spends more money than someone in a poor neighborhood.
  2. The "Natural Drift" Fruit (Within-Unit): This is how things change naturally over time. Maybe you spent more money this month because it's your birthday, or the corn grew a bit because of the rain. This is the "noise" of life.
  3. The "Magic Effect" Fruit (Counterfactual Treatment): This is the specific change caused by the intervention (the fertilizer, the cash transfer, the medicine).

The Discovery:
The paper argues that overall accuracy (the whole smoothie) is a bad way to judge if the AI will work for your experiment.

  • An AI can be 99% accurate at predicting the smoothie's taste because it nailed the "Who You Are" fruit.
  • But if it has zero of the "Magic Effect" fruit, it will fail your experiment.

The Secret Ingredient: The "Time-Travel" Test

How do we know if the AI has the "Magic Effect" fruit without actually running the experiment on the whole population?

The author suggests a clever trick using Panel Data (data from the same people/plots at two different times, like before and after).

The Analogy: The "Before and After" Mirror
Imagine you are trying to teach a robot to understand how a car accelerates when you press the gas pedal.

  • Bad Approach: You show the robot a picture of a Ferrari and a picture of a Tractor. The robot learns: "Ferraris are fast, Tractors are slow." It gets the prediction right! But if you ask, "What happens if I press the gas on the Tractor?" it has no idea, because it only learned about the difference between the cars, not the action of the gas pedal.
  • The Paper's Approach: You show the robot the same Tractor at 1:00 PM and then at 1:05 PM.
    • If the robot predicts a different speed at 1:05 PM than at 1:00 PM, it must be paying attention to things that change over time (like the engine revving after you press the gas).
    • If the robot gives the same prediction both times — just saying "It's a Tractor, it's slow" — it fails the test, because it's only paying attention to what the vehicle is, not what's happening to it.

The Metric (The "Diff-vs-Diff" Slope):
The paper proposes a specific test:

  1. Take the people who did not get the treatment (the control group).
  2. Look at how their actual outcomes changed from Time 1 to Time 2.
  3. Look at how the AI's predicted outcomes changed from Time 1 to Time 2.
  4. The Test: Do the AI's predictions move in sync with the real changes?
  • If the AI predicts the changes well: It means the AI is sensitive to the "drift" of life. The author argues this is a strong sign it will also be sensitive to the "Magic Effect" (the treatment).
  • If the AI ignores the changes: It means the AI is just memorizing who is rich and who is poor. It will fail to detect the treatment effect.

Why This Matters

In the past, researchers picked the "best" AI based on who had the highest R-Squared (overall accuracy).

  • The Paper says: Stop doing that! A high R-Squared often just means the AI is good at spotting "Rich vs. Poor" (Between-Unit).
  • The New Rule: Pick the AI that is best at predicting changes over time (Within-Unit), even if its overall accuracy is slightly lower.

The "Magic" Fix (With a Warning)

If you find an AI that is good at predicting changes (high "Within-Unit" score), the paper suggests you can use it to fix your results.

  • If the AI is only 80% good at detecting changes, you can mathematically "stretch" your results to get the true answer.
  • The Warning: This only works if you assume that "predicting natural changes" is very similar to "predicting treatment changes." The author thinks this is usually true (e.g., if a model knows how a person's spending changes when they get a bonus, it likely knows how it changes when they get a cash transfer), but it's an assumption that needs to be checked.

Summary in One Sentence

Don't just ask your AI, "How well can you guess the answer?" Ask it, "How well can you guess the change?" Because if it can't guess the change, it definitely can't guess the effect of your experiment.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →