Robust Counterfactual Inference in Markov Decision Processes

This paper proposes a novel, scalable non-parametric approach that computes tight bounds on counterfactual transition probabilities across all compatible causal models in Markov Decision Processes, enabling the identification of robust counterfactual policies that optimize worst-case rewards without relying on prohibitively expensive optimization or fixed causal assumptions.

Jessica Lally, Milad Kazemi, Nicola Paoletti

Published 2026-03-04
📖 5 min read🧠 Deep dive

Imagine you are a doctor who just treated a patient. The patient took a specific medicine, and unfortunately, they didn't get better. You want to know: "What would have happened if I had given them a different medicine instead?"

This is called counterfactual inference—thinking about "what if" scenarios. In the world of Artificial Intelligence (AI), specifically in systems that make decisions over time (like self-driving cars or medical treatment plans), we call these systems Markov Decision Processes (MDPs).

The problem with current AI methods is that they try to answer "what if" by guessing a single, specific story about how the world works. But the real world is messy. There are many different stories (causal models) that could explain why the patient got sick, and each story leads to a different "what if" answer. Relying on just one guess is risky, especially in life-or-death situations.

This paper introduces a new, smarter way to handle these "what if" questions. Here is the breakdown using simple analogies:

1. The Problem: The "One-Story" Trap

Imagine you are a detective trying to solve a crime. You have a witness who saw a car crash.

  • Old Method: The detective picks one theory (e.g., "The driver was texting") and builds the entire case on it. If that theory is wrong, the whole case collapses.
  • The Reality: There are many theories that fit the evidence (texting, speeding, a tire blowout). Each theory suggests a different outcome if the driver had acted differently.
  • The Risk: In safety-critical fields (like healthcare or aviation), guessing the wrong theory could lead to dangerous advice.

2. The Solution: The "Fog of War" Map

Instead of picking one story, the authors propose looking at all possible stories that fit the evidence at the same time.

Think of it like navigating through a thick fog.

  • Old Method: You draw a single, thin line on a map saying, "The road goes exactly here." If you step off that line, you might fall off a cliff.
  • New Method: You draw a wide, shaded corridor (an interval) on the map. You say, "The road is somewhere in this wide area." You don't know the exact path, but you know the boundaries.

This "corridor" is called an Interval Counterfactual MDP. It doesn't give you a single number for the future; it gives you a range (a best-case and a worst-case scenario) that covers every plausible reality.

3. The Secret Sauce: The "Magic Formula"

Usually, calculating the boundaries of all these possible worlds is like trying to solve a puzzle with a billion pieces. It takes supercomputers ages to figure it out.

The authors discovered a mathematical shortcut (a closed-form expression).

  • Analogy: Imagine trying to find the highest and lowest points in a mountain range.
    • Old way: You hire a team to climb every single hill and valley to measure them. (Takes forever).
    • New way: You realize the mountains follow a specific pattern. You use a simple formula to instantly calculate the highest peak and the deepest valley without climbing a single one.

This makes their method 4 to 251 times faster than previous methods, allowing it to be used on large, complex systems like hospital networks or aircraft control.

4. The Result: The "Cautious Captain"

Once they have this "foggy corridor" map, they use a strategy called Pessimistic Value Iteration.

  • The Metaphor: Imagine a ship captain navigating a stormy sea where the map is blurry.
    • A reckless captain might sail straight for the treasure, hoping the map is right.
    • A robust captain (the one this paper creates) assumes the worst-case scenario: "What if the fog is thickest right here? What if the rocks are closer than they look?"
    • The captain charts a course that guarantees safety and success even in the worst possible version of the fog.

Why Does This Matter?

  • Safety: In healthcare, if an AI suggests a treatment change, we want to be sure it won't hurt the patient even if our understanding of the disease is slightly off. This method guarantees that.
  • Speed: Because it's so fast, it can be used in real-time systems, not just in slow research labs.
  • Trust: It admits what it doesn't know. Instead of pretending to have a single "truth," it gives a honest range of possibilities, which is much more useful for human decision-makers.

Summary

The paper teaches AI to stop guessing a single "what if" story and instead calculate the entire range of possible "what if" stories. It does this using a clever math trick that is incredibly fast, allowing the AI to make decisions that are safe and reliable, even when the future is uncertain. It's the difference between betting on a single horse and buying tickets for every horse in the race that has a chance of winning.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →