Causal Identification from Counterfactual Data: Completeness and Bounding Results

This paper introduces the CTFIDU+ algorithm to establish the completeness of identifying counterfactual queries from physically realizable Layer 3 data, defines the fundamental limits of exact causal inference in this setting, and derives novel analytic bounds for non-identifiable quantities that are empirically shown to be tighter with counterfactual data.

Arvind Raghavan, Elias Bareinboim

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine you are a detective trying to solve a crime, but you only have access to three types of clues:

  1. The Scene (Observation): You see what happened naturally. (e.g., "The suspect was wearing a red hat and ran away.")
  2. The Experiment (Intervention): You force a change and see what happens. (e.g., "We put a blue hat on the suspect and see if they still run away.")
  3. The "What If" (Counterfactual): You imagine a different reality. (e.g., "If the suspect had worn a blue hat instead of the red one, would they have run away?")

For decades, scientists believed Type 3 clues were impossible to get in the real world. You can't go back in time and change the past. So, they built a "Causal Hierarchy" (a ladder of knowledge) where you could climb from Type 1 to Type 2, but Type 3 was considered a mythical peak you could never reach.

This paper is about breaking that ceiling.

The Big Breakthrough: "Time-Traveling Cameras"

The authors (Arvind Raghavan and Elias Bareinboim) start with a surprising discovery from their previous work: You actually can get Type 3 data in the real world.

They call this "Counterfactual Realizability."

The Analogy: Imagine a traffic camera filming a speeding car.

  • Standard Experiment: You stop the car, paint it blue, and let it drive. This changes the car's actual color.
  • Counterfactual Experiment: You use a special "digital filter" on the video feed. You tell the AI analyzing the video, "Pretend this car is blue," but you don't actually change the car's paint or the driver's behavior. The AI sees a blue car, but the real world remains unchanged.

This allows you to collect data about "what would have happened if the car were blue" without actually changing the car. It's like having a camera that can see parallel universes.

The New Detective Tool: CTFIDU+

Now that we have these "Time-Traveling Cameras," the big question is: What new mysteries can we solve?

Previously, algorithms could only solve mysteries using Type 1 and Type 2 data. The authors built a new super-algorithm called CTFIDU+.

  • What it does: It takes a mix of regular data, experiment data, and this new "parallel universe" data to solve complex "What If" questions.
  • The Guarantee: They proved that this algorithm is complete. This means if a "What If" question can be solved using the data you have, CTFIDU+ will find the answer. If it says "I can't solve this," then it is truly impossible to solve with that data.

The "Glass Ceiling" of Knowledge

Here is the most fascinating part. The authors asked: "If we have these time-travel cameras, can we solve every 'What If' question?"

The answer is No.

They discovered a fundamental limit. Even with these amazing cameras, there is a "Glass Ceiling" (which they call Layer 2.5).

  • Below the ceiling: You can solve these problems by combining your data.
  • Above the ceiling: There are some "What If" questions that are fundamentally unknowable, even with perfect experiments.

The Metaphor: Imagine trying to figure out the exact recipe of a cake by tasting it.

  • If you can taste the cake (Observation) and bake a few test batches (Intervention), you can guess the recipe.
  • If you can also magically taste the cake while imagining you added extra sugar (Counterfactual), you can guess even better.
  • But: If the recipe depends on a secret ingredient that was never used in any version of the cake you've ever seen or imagined, you will never know it. No amount of time travel or magic tasting can reveal a secret that leaves no trace in reality.

The paper proves that some "What If" questions fall into this "secret ingredient" category. They are mathematically impossible to pin down exactly.

Why Does This Matter? (The "Tighter Bounds" Trick)

Even if we can't solve a mystery exactly, does this new data help? Yes!

The authors show that even for the "unsolvable" mysteries, using this new counterfactual data allows us to draw a much smaller circle around the answer.

The Analogy:

  • Old Way (No Counterfactual Data): "The suspect's speed was somewhere between 0 and 100 mph." (Useless!)
  • New Way (With Counterfactual Data): "The suspect's speed was somewhere between 45 and 55 mph." (Much more useful!)

They proved mathematically that adding this "parallel universe" data shrinks the range of possible answers, making our guesses much sharper, even if we can't get the single perfect number.

Summary

  1. We can get "What If" data: We don't need to break physics; we just need to use clever experimental setups (like digital filters on video) to simulate alternate realities.
  2. We have a perfect tool: The CTFIDU+ algorithm can tell us exactly which "What If" questions are solvable with our data and which are not.
  3. There is a limit: Some questions are fundamentally impossible to answer exactly, no matter how much data we collect.
  4. But we can still improve: Even for the impossible questions, this new data helps us narrow down the answer significantly, turning a wild guess into a precise estimate.

This work changes the game for AI, fairness, and medicine, giving us a roadmap for what we can know about the past and future, and where we must simply accept uncertainty.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →