Conformal e-prediction in the presence of confounding

Imagine you are a doctor trying to predict whether a patient will recover from a disease. You have a lot of historical data from past patients, but there's a catch: confounding.

In the real world, patients aren't randomly assigned treatments. Maybe sicker patients (the "confounder") are more likely to get a specific drug. If you just look at the data, you might think the drug is killing people, when in reality, the drug was just given to the sickest people.

This paper is about how to make reliable predictions about what would happen if we forced a specific treatment on a patient (an "intervention"), even when our historical data is messy and biased.

Here is the breakdown using simple analogies:

1. The Problem: The "Messy Kitchen" vs. The "Controlled Lab"

The Messy Kitchen (Observational Data): Imagine you are a chef watching a busy kitchen. You see that whenever the chef uses a specific knife (Variable $X$ $X$ ), the food burns (Variable $Y$ $Y$ ). But you notice the chef only uses that knife when the stove is broken (Confounder $Z$ $Z$ ).
- If you just look at the data, you think: "Knives cause burning."
- But you want to know: "If I force the chef to use that knife on a working stove, will the food burn?"
The Goal: You want to predict the outcome of a "what-if" scenario (setting $X$ to a specific value) using data from the messy kitchen.

2. The Solution: "Conformal e-prediction" (The Magic Shield)

The authors introduce a mathematical tool called Conformal e-prediction. Think of this as a Magic Shield that protects you from making false claims.

How it works: Instead of giving you a single guess (e.g., "The patient will recover"), it gives you a list of possibilities (a prediction region).
The Guarantee: The shield has a special property: If you set the shield to a "safety level" (called $\alpha$ ), the chance that the real outcome is not on your list is mathematically guaranteed to be very small.
The "e-variable": This is the core of their math. Think of an "e-variable" as a betting score.
- If your prediction is wrong, the score goes up.
- If your prediction is right, the score stays low.
- The authors prove that, on average, this score will never exceed 1. This means you can't "get rich" by betting against their method; it's statistically honest.

3. The Two Scenarios

Scenario A: The "Fair Coin" World (IID Setting)

Imagine the past patients were like flipping a fair coin. Every patient was independent of the others.

The Method: The authors show you how to count the data. You look at how many times a specific combination of "Stove Status + Knife + Burned Food" happened.
The Trick: They use a clever counting formula (adding a tiny "+1" to every count) to smooth out the data. This ensures that even if you haven't seen a specific situation before, your math doesn't break.
Result: You get a list of likely outcomes for your new patient. If the list is small, you are very confident. If the list is huge, you admit you don't know enough.

Scenario B: The "Smart Chef" World (Dependent Setting)

Now, imagine the chef isn't random. The chef is smart and learns from the past.

The Problem: The chef looks at what happened yesterday and decides what knife to use today. The data is no longer "independent."
The Innovation: The authors prove that even if the chef is a genius strategist, as long as the outcome (the burning food) follows the laws of physics (a stable mechanism), their "Magic Shield" still works.
The "Y-Oblivious" Rule: This is a fancy term meaning: "The chef can look at past knives and stoves to decide the next knife, but the chef cannot look at the future burning food to decide the knife." As long as the chef doesn't have a crystal ball, the math holds up.

4. Why This Matters (The "Patient Death" Example)

The paper highlights a specific use case: Safety.
Imagine you are worried about a specific bad outcome, like "Patient Death."

You don't care about predicting every possible outcome perfectly. You just want to be sure that "Death" is not on the list of likely outcomes.
Using their method, you can calculate a score. If the score is low enough, you can confidently say: "Based on this messy data, if we give this drug, the patient will not die."
And the best part? You have a mathematical guarantee that you won't be wrong very often.

Summary in One Sentence

This paper provides a new mathematical "safety net" that allows us to make trustworthy predictions about what would happen if we changed a variable (like a medical treatment), even when our historical data is biased by hidden factors or generated by a smart, adaptive system.

The Metaphor:
If traditional statistics is like trying to guess the weather by looking at a single, cloudy day, Conformal e-prediction is like wearing a suit of armor that guarantees you won't get wet, even if the weather forecast is based on a messy, biased history of rain.

Here is a detailed technical summary of the paper "Conformal e-prediction in the presence of confounding" by Vladimir Vovk and Ruodu Wang.

1. Problem Statement

The paper addresses a fundamental challenge in causal inference: predicting the outcome of an intervention ( $X := x$ ) on a target variable $Y$ when the available data is observational and subject to confounding by a variable $Z$ .

The Setting: The authors consider a causal graph where $Z$ influences both $X$ and $Y$ (a confounder), and $X$ influences $Y$ . The goal is to predict the distribution of $Y$ after "mutilating" the graph by removing the arrow from $Z$ to $X$ and setting $X$ to a specific value $x$ .
The Challenge: Standard conformal prediction assumes Independent and Identically Distributed (IID) data. However, in causal inference, the data generation process for $X$ in the observational study may differ from the intervention setting. Furthermore, the authors explore settings where $X$ is not generated by a stable stochastic mechanism but rather chosen by an arbitrary strategy (potentially dependent on past observations), which violates standard IID assumptions.
Objective: To provide finite-sample validity guarantees for prediction regions (or e-values) regarding the interventional distribution of $Y$ , even in the presence of confounding and non-IID data generation for $X$ .

2. Methodology

The authors extend Conformal e-prediction (a framework combining conformal prediction with e-values for hypothesis testing) to causal settings.

A. The IID Setting (Section 2)

Target Probability: They define the interventional probability $p_y$ (the probability that $Y=y$ given $X=x$ ) using the back-door adjustment formula:
$p_y = \sum_{z \in Z} P(Z=z)P(Y=y \mid X=x, Z=z)$
Estimator Construction: Given an observational sample $(X_n, Y_n, Z_n)_{n=1}^N$ , they construct an estimator $\hat{F}_y$ for $p_y$ . This estimator uses a specific smoothing technique (adding 1 to counts in both the numerator and denominator) to ensure non-zero probabilities:
$\hat{F}_y = \sum_{z \in Z} \frac{|\{n: Z_n=z\}| + 1}{N + 1} \times \frac{|\{n: (X_n, Y_n, Z_n) = (x, y, z)\}| + 1}{|\{n: (X_n, Z_n) = (x, z)\}| + 1}$
E-variable Construction: They prove that the ratio $p_y / \hat{F}_y$ has an expectation $\leq 1$ . Consequently, for any probability measure $Q$ on $Y$ , the random variable:
$E = \frac{Q(\{Y_{N+1}\})}{\hat{F}_{Y_{N+1}}}$
is an e-variable (non-negative with expectation $\leq 1$ ).

B. The Non-IID / Adaptive Setting (Section 3)

The authors relax the assumption that $X_n$ are generated by a stable mechanism.

Y-Oblivious Interpretation: They consider a scenario where $X_{n+1}$ can depend arbitrarily on all previous $X_i$ and $Z_i$ (but not on $Y_i$ ). This models a situation where an experimenter or adversary chooses $X$ based on past context but cannot see the future outcomes.
Result: They show that Lemma 1 (the expectation bound) and the resulting e-prediction regions remain valid under this "Y-oblivious" interpretation, effectively handling time-dependent or strategic selection of $X$ .

3. Key Contributions

Extension to Confounding: The paper successfully adapts conformal e-prediction to causal inference problems involving confounders, utilizing the back-door criterion to identify causal effects.
Robustness to Non-IID $X$ : A significant contribution is the demonstration that valid e-prediction regions can be constructed even when the treatment variable $X$ is not IID, provided the selection strategy does not depend on the unobserved outcomes $Y$ (the Y-oblivious condition).
Finite-Sample Guarantees: Unlike asymptotic methods, the proposed approach offers rigorous finite-sample validity. The error probability at a significance level $\alpha$ integrates to at most 1, and by Markov's inequality, the error probability is bounded by $1/\alpha$.
E-Prediction Regions: The authors define prediction regions $\Gamma_\alpha$ based on the e-variable:
$\Gamma_\alpha := \left\{ y \in \mathcal{Y} : \frac{Q(\{y\})}{\hat{F}_y} < \alpha \right\}$
These regions are guaranteed to contain the true interventional outcome with high confidence.

4. Key Results

Lemma 1: Establishes that $E[p_y / \hat{F}_y] \leq 1$ . This is the core mathematical engine ensuring validity.
Corollary 2: Proves that the constructed statistic $E$ is an e-variable. This allows for hypothesis testing (e.g., testing if a specific outcome like "patient death" is unlikely).
Validity Property: The prediction regions satisfy the strong validity condition:
$\int_0^\infty P(Y \notin \Gamma_\alpha) d\alpha \leq 1$
This implies that for any fixed $\alpha$ , $P(Y \notin \Gamma_\alpha) \leq 1/\alpha$ .
Optimality: For large sample sizes $N$ and small confounder sets $Z$ , the data-driven regions $\Gamma_\alpha$ approximate the "oracle" regions (which use the true $p_y$ ) closely.

5. Significance and Implications

Causal Inference Rigor: The paper bridges the gap between conformal prediction (known for distribution-free, finite-sample guarantees) and causal inference (often reliant on asymptotic assumptions or specific model correctness).
Safety-Critical Applications: The ability to construct prediction regions that are valid even under non-IID treatment selection is crucial for safety-critical domains (e.g., medicine, policy). For instance, if a specific adverse outcome (like death) is the focus, the method allows one to confidently exclude it if the estimated probability is sufficiently low, with a guaranteed error bound.
Flexibility: The framework is flexible enough to handle complex causal graphs (via the back-door criterion) and various data generation processes for the treatment variable, making it more robust than standard conformal prediction in real-world observational studies.
Future Directions: The authors note that the method can be extended to regression (continuous $Y$ ) and that the regularization constants in the estimator (the "+1" terms) might be optimized for tighter bounds, suggesting avenues for improving efficiency.

In summary, this work provides a mathematically rigorous, finite-sample framework for making causal predictions with guaranteed error rates, even when data is confounded and the treatment assignment is non-random or adaptive.