Is Inference Conditional on Not Rejecting a Pre-test… — Plain-Language Explanation

✨

This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Picture: The "Pre-Check" Dilemma

Imagine you are a chef trying to bake a perfect cake (estimating a treatment effect). You have a secret recipe (your statistical model) that works perfectly only if your ingredients are fresh and measured correctly (your assumptions hold).

In the real world, you can't be 100% sure your ingredients are perfect. So, before you bake, you do a pre-check: you smell the flour and taste the eggs to see if they are fresh.

If they smell bad: You throw the recipe away and don't bake (you reject the study).
If they smell good: You proceed to bake and present the cake to the judges (you report your results).

The Question: Does this "smell test" (the pre-test) mess up the reliability of your cake? Specifically, if you only report the cake when the ingredients passed the smell test, is your cake actually more likely to be a disaster than if you had just baked it blindly?

Many statisticians have worried that this "pre-check" creates a hidden trap, making your results look better than they really are. This paper says: Not necessarily. In fact, you might be safer than you think.

Part 1: When the Ingredients Are Actually Fresh (The "Null" Hypothesis)

Let's assume your ingredients are actually fresh. The recipe is perfect.

The Old Fear:
People thought that because you only show the cake when the smell test passes, you might be "cherry-picking" the best-looking cakes. They worried that even if the ingredients were fresh, the act of smelling them first might make the final cake slightly less reliable (or "under-cover," meaning the judges might think it's perfect when it's actually a bit burnt).

The Paper's Discovery:
The authors prove that if your ingredients are truly fresh, your cake is actually safer than you think.

The Analogy: Imagine the "smell test" and the "baking process" are two friends holding hands. If the ingredients are fresh, and you only bake when the smell test says "Go," you are actually filtering out some of the random bad luck that could have happened during baking.
The Result: The paper shows that your cake is conservative. This means the confidence interval (the range of flavors the judges expect) is actually wider than necessary. You are being extra careful.
- Real-world translation: If you run a pre-test and it passes, your statistical confidence interval is slightly "wider" (more cautious) than it would be if you hadn't tested. This is good! It means you are less likely to make a false claim. You aren't under-covering; you are over-covering (being safe).

The Catch: This safety net only works if the "smell test" and the "baking" aren't perfectly synchronized in a weird way. But in most standard cases, the safety net holds.

Part 2: When the Ingredients Are Spoiled (The "Alternative" Hypothesis)

Now, let's assume your ingredients are actually spoiled (your assumption is wrong). Maybe the flour is moldy, but you didn't notice.

The Reality:
If the ingredients are bad, the cake will taste terrible no matter what. The statistical estimate is biased.

The Comparison:
The authors ask: If the ingredients are bad, is the cake worse if you did the smell test first, compared to just baking blindly?

Scenario A (No Pre-Test): You bake blindly. The cake is terrible. The judges realize it's bad 20% of the time (low coverage).
Scenario B (Pre-Test): You do the smell test. It passes (you missed the mold). You bake. The cake is terrible.

The Surprise:
The paper finds that in many common situations (like Randomized Controlled Trials or Instrumental Variables), the cake is actually less terrible in Scenario B than in Scenario A.

The Analogy: Imagine the "smell test" is a filter. Even if it lets some bad flour through, it filters out the worst batches of bad flour. By only baking when the test passes, you are inadvertently selecting for the "least spoiled" ingredients.
The Result: The "Conditional Coverage" (how often the judges are right given you passed the test) is often higher than the "Unconditional Coverage" (how often they are right if you baked everything).
- Real-world translation: Pre-testing doesn't just fail to hurt you; in some cases, it actually protects you from the worst errors when your assumptions are slightly wrong.

Part 3: The "Difference-in-Differences" (DID) Warning

The authors do have one specific warning for a popular method called Difference-in-Differences (DID), often used in economics to study policy changes.

The Analogy: In DID, the "smell test" checks if two groups were moving in parallel before a policy change. If they weren't, the test fails.
The Problem: In these specific studies, the relationship between the "smell" and the "baking" is tricky. The paper shows that in DID studies, if the trends are slightly different (spoiled ingredients), the pre-test might not filter out the worst cases as effectively as in other methods.
The Data: The authors looked at 12 famous DID studies. They found that while pre-testing didn't make things much worse, it didn't offer the same "super-protection" it did in other types of studies. The cake was still a bit risky, but not a disaster.

The Takeaway: Should You Stop Pre-Testing?

No. The paper argues that the "Pre-Test" is a good thing, despite the fears of some statisticians.

If you are right: Pre-testing makes your results more conservative (safer). You are less likely to claim a discovery that isn't there.
If you are slightly wrong: Pre-testing often acts as a shield, filtering out the worst errors and keeping your results more reliable than if you had ignored the test entirely.
The Cost: The only "cost" is that you might occasionally throw away a perfectly good cake because the smell test was too sensitive (a false rejection). But the paper suggests this cost is small compared to the benefit of avoiding bad cakes.

In Simple Terms:
Think of pre-testing like a security checkpoint at an airport.

Old View: "Checking everyone's bags slows things down and might make the flight less efficient."
This Paper's View: "Checking bags actually makes the flight safer. Even if the scanner isn't perfect, the people who pass the scan are statistically less likely to be carrying a bomb than the general population. And if the scanner is right, the flight is safer than if we didn't scan anyone at all."

The authors conclude that researchers should feel comfortable doing these pre-tests. They don't break the math; they often make the results more robust.

1. Problem Statement

In applied econometrics, researchers frequently conduct pre-tests (specification tests) to validate identifying assumptions before reporting their main estimates. Common examples include:

Difference-in-Differences (DID): Testing for parallel pre-trends before estimating treatment effects.
Randomized Controlled Trials (RCTs) & IV: Conducting balancing tests on covariates to ensure randomization or instrument exogeneity.
GMM: Using J-tests to check overidentifying restrictions.

The Core Question: Does conditioning inference on the non-rejection of these pre-tests undermine the validity of the resulting confidence intervals (CIs)? Specifically, if a researcher only reports a CI when the pre-test passes, is the Conditional Coverage (CC) of that CI lower than its nominal coverage (NC) or its Unconditional Coverage (UC)?

The Intuition for Concern: Standard literature on post-selection inference suggests that conditioning on a selection event (like passing a test) distorts the distribution of the estimator, potentially leading to under-coverage (the CI fails to contain the true parameter more often than the nominal rate).

2. Methodology and Setup

The authors formalize a general framework involving:

Target Parameter: $\beta_0$ (estimated by $\hat{\beta}$ ).
Pre-test Parameter: $\theta_0$ (estimated by $\hat{\theta}$ ), where the null hypothesis of valid specification is $\theta_0 = 0$ .
Asymptotic Normality: Under the null, $(\hat{\beta}, \hat{\theta})$ are jointly asymptotically normal with a specific covariance structure $\Sigma$ .
Pre-test Statistic: The pre-test is based on statistics $T_{j,n}$ derived from $\hat{\theta}$ (e.g., F-tests, sup-tests, Kolmogorov-Smirnov tests). The test rejects if $T_{j,n} > q_{j,n}$ .
Inference Rule: The researcher reports the standard CI for $\beta_0$ only if all pre-tests are not rejected.

Key Assumptions:

Asymptotic Normality: The joint distribution of the normalized estimators converges to a multivariate normal distribution.
Convexity and Symmetry: The pre-test statistics $T_j$ are convex and symmetric around zero (e.g., $T(x) = x'x$ for F-tests, $T(x) = \max|x_i|$ for sup-tests).
Local Alternatives: To analyze power and bias, the authors consider "local alternatives" where the true parameter deviates from the null at a rate of $O(n^{-1/2})$ , denoted by $\delta$ .

3. Key Contributions and Theoretical Results

The paper provides a rigorous analysis of the Conditional Coverage (CC) compared to the Nominal Coverage (NC) and Unconditional Coverage (UC).

A. Results Under the Null Hypothesis (Valid Specification)

Main Finding: Conditional inference is valid and conservative.

Theorem 1: Under the null ( $\theta_0=0$ ), the conditional coverage of the standard CI is always greater than or equal to the nominal coverage ( $1-\alpha$ ).
$\lim_{n \to \infty} P(\beta_0 \in CI \mid \text{Pre-test passed}) \geq 1 - \alpha$
Mechanism: This result relies on the Gaussian Correlation Inequality (Royen, 2014). Because the pre-test acceptance region is a convex, symmetric set, and the joint distribution is Gaussian, the probability of the estimator falling in the CI given the pre-test passed is higher than the unconditional probability.
Implication: Pre-testing does not lead to under-coverage under the null; it leads to over-coverage (conservative inference).
Exactness Condition (Proposition 1): The conditional coverage equals the nominal coverage if and only if the estimator $\hat{\beta}$ and the pre-test statistic $\hat{\theta}$ are asymptotically independent ( $\Sigma_{12} = 0$ ). If they are correlated, the inference is strictly conservative.

B. Results Under Local Alternatives (Misspecification)

When the null is false ( $\theta_0 \neq 0$ ), the estimator $\hat{\beta}$ is biased. The authors compare the CC to the UC (which is already low due to bias).

Main Finding: Conditional inference can be less distorted (have higher coverage) than unconditional inference.

Theorem 3 (Local Neighborhood): If $\hat{\beta}$ and $\hat{\theta}$ are correlated, there exists a neighborhood around the null where the CC is greater than both the NC and the UC.
Theorem 4 (Global Result): Under specific conditions, the CC is globally larger than the UC for any degree of misspecification.
- Condition: $\mu_1(\delta) = \Sigma_{12}(\delta)\mu_2(\delta)$ .
- Interpretation: The standardized bias of the main estimator ( $\mu_1$ ) must equal the standardized bias of the pre-test ( $\mu_2$ ) multiplied by their correlation.
- Application: This holds in RCTs/IV if treatment/instrument exogeneity holds conditional on the covariates in the balancing test. It often fails in DID with differential linear trends and AR(1) errors, where biases have opposite signs.

C. Numerical and Empirical Analysis

Robustness: Numerical simulations show that the "global" result (CC > UC) holds even if the strict condition $\mu_1 = \Sigma_{12}\mu_2$ is violated, provided the bias from omitted variables is not too large relative to the bias captured by the pre-test.
DID Calibration: The authors calibrated a simulation to 12 real-world DID studies (Roth, 2022).
- Result: While both UC and CC were below 95% (due to inherent bias in the DGPs), the CC was very close to the UC (average ratio CC/UC $\approx$ 0.975).
- Conclusion: In these realistic DID scenarios, pre-testing only slightly reduced coverage compared to reporting the estimate unconditionally. It did not cause a catastrophic drop in reliability.

4. Significance and Implications

Re-evaluating the "Pre-test Penalty": The paper challenges the conventional wisdom that pre-testing inevitably destroys inference validity. Instead, it shows that under the null, pre-testing makes inference conservative (safer), not invalid.
Protection Against Misspecification: Under local alternatives, pre-testing can act as a filter that removes the most biased estimates, potentially resulting in a conditional CI that covers the true parameter more often than an unconditional CI would.
Context Matters:
- RCTs/IV: Pre-testing is highly beneficial; the conditions for "global" superiority of CC over UC are often met.
- DID: The relationship is more complex. While pre-testing doesn't drastically worsen coverage, it doesn't guarantee the "global" improvement seen in RCTs because the bias structures often violate the necessary correlation conditions.
Methodological Guidance:
- Researchers should not fear reporting results only after passing pre-tests; the resulting CIs are valid (though conservative).
- In GMM, using a non-optimal weighting matrix might be preferable if it induces correlation between the estimator and the test statistic, thereby providing a "safety buffer" (conservative inference) against misspecification.
- The paper highlights the importance of the correlation between the pre-test and the main estimator in determining the reliability of conditional inference.

5. Limitations

Single Assumption: The analysis assumes a single identifying assumption is tested. It does not address sequential pre-testing (e.g., testing one assumption, rejecting it, adding controls, and testing again), which is common in practice and could lead to different inference properties.
Specific Test Types: The theoretical results rely on convex, symmetric pre-tests (F-tests, sup-tests). They may not apply to moment inequality models or non-symmetric tests without modification.

Conclusion

The paper provides a theoretical defense of standard pre-testing practices in econometrics. It demonstrates that conditional inference is not less reliable than unconditional inference; rather, it is often more reliable (higher coverage) under misspecification and strictly conservative under correct specification. The "cost" of pre-testing is primarily a loss of power (conservatism) rather than a loss of validity.

Is Inference Conditional on Not Rejecting a Pre-test Less Reliable than Unconditional Inference?