Estimation and exclusion restrictions in clustered linear models

Imagine you are trying to figure out if a new fertilizer makes plants grow taller. You have a garden with many different flower beds (clusters). You want to know the effect of the fertilizer on a specific plant, but there's a catch: the plants in the same flower bed are connected. If one plant gets a boost, it might shade its neighbor, or they might share water through their roots. This is called interference.

In the world of statistics, this is a nightmare. Standard methods (like the "Ordinary Least Squares" or OLS) assume every plant is an island. When they aren't, the math gets messy, and your results can be wildly wrong.

This paper by Mikusheva, Sølvsten, and Jing is like a new, smarter gardening manual for these tricky situations. Here is the breakdown in simple terms:

1. The Problem: The "Bad Neighbor" Effect

Imagine you are trying to measure how much a specific plant (let's call it Plant A) grows because of fertilizer.

The Old Way (OLS): You look at Plant A and its neighbor, Plant B. But Plant B was also fertilized, and its roots are tangled with Plant A's. You can't tell if Plant A grew because of its own fertilizer or because Plant B's roots helped it.
The "Strict" Rule: To fix this, old methods demanded that no plant in the whole garden could ever influence another. This is like saying, "If you put a plant in a pot, it must be the only plant in the universe." In real life (like in villages or social networks), this is impossible. Neighbors always affect neighbors.

2. The Solution: The "Smart Exclusion" Rule

The authors say, "We don't need to assume no neighbors affect each other. We just need to assume that distant neighbors don't."

The Analogy: Imagine you are in a crowded room.
- The person standing right next to you (0 meters away) is shouting in your ear. You can't ignore them.
- The person 5 meters away is talking, but you can't hear them clearly.
- The person 20 meters away is whispering; they definitely aren't affecting your conversation.
The Paper's Trick: The researchers create a "map" (a matrix) of who influences whom. They say, "We will only trust data from people who are far enough away that they can't possibly be shouting in your ear." This is called an exclusion restriction.

3. The New Tool: The "Leave-Out" Instrument

Once they decide who is "safe" (distant) and who is "unsafe" (close), they build a new calculator.

The Old Calculator: Tries to use everyone's data to figure out the average. Because the "unsafe" neighbors are included, the math gets biased (like trying to weigh a bag of apples while someone is secretly adding rocks to the bag).
The New Calculator (The Internal Instrument):
- For every single plant, it asks: "Who in the garden is far enough away that they didn't mess with you?"
- It uses only those distant plants to figure out what the "normal" growth looks like.
- It essentially says: "I will predict what Plant A should have looked like based on Plant Z (who is far away), and then I'll compare that to what Plant A actually looked like."
- This is called a "Leave-Out" approach because, for every calculation, it leaves out the specific data points that are "contaminated" by interference.

4. Why This Matters: The "Weak Signal" Problem

Sometimes, the "safe" distant neighbors are so far away that they don't tell us much. It's like trying to guess the weather in New York by looking at the sky in London. The signal is weak.

The Risk: If the signal is too weak, standard statistical tests will lie to you. They might say, "We are 95% sure this fertilizer works!" when actually, the data is too fuzzy to tell.
The Fix: The authors built a special "safety net" (a robust variance estimator). It's like a seatbelt that tightens when the car hits a bump. If the data is fuzzy, the safety net admits, "We aren't sure," and gives you a wider range of possibilities (a wider confidence interval) instead of a false, precise answer.

5. The Real-World Test: The Kenya Cash Experiment

The authors tested this on a real experiment in rural Kenya.

The Setup: The government gave cash to some villages.
The Problem: If Village A gets money, they might buy goods from Village B. So, Village B's economy goes up too, even though they didn't get the cash. This is "spillover."
The Result:
- If you assume spillovers happen only within 1 kilometer, your estimate is very precise (narrow confidence interval).
- If you assume spillovers happen within 3 kilometers (a more cautious, realistic assumption), your estimate becomes less precise (wider interval).
- The Lesson: The paper shows that your answer depends heavily on how much "interference" you are willing to assume. Their new method lets you see exactly how much your answer changes based on your assumptions, rather than hiding the uncertainty.

Summary

Think of this paper as a detective's guide for messy neighborhoods.

Old detectives assumed everyone was innocent and unrelated. When they found out neighbors were conspiring, their cases fell apart.
These new detectives admit neighbors do conspire, but they assume the conspiracy stops at a certain distance.
They use a clever trick: to solve a crime in one house, they only look at evidence from houses far away that couldn't possibly be involved.
If the evidence from far away is too weak, they don't guess; they admit, "We need more time," and give a wider range of suspects.

This allows researchers to get honest answers even when the data is messy, connected, and full of "bad neighbors."

Here is a detailed technical summary of the paper "Estimation and exclusion restrictions in clustered linear models" by Anna Mikusheva, Mikkel Sølvsten, and Baiyun Jing.

1. Problem Statement

The paper addresses the estimation of structural parameters in linear regression models characterized by three specific complexities:

Clustered Data: Observations are grouped into disjoint clusters (e.g., panels, networks, spatial units) where independence holds across clusters but arbitrary dependence exists within them.
High-Dimensional Controls: The models include a large number of control variables (e.g., multi-way fixed effects), often where the number of controls $K$ is large relative to the sample size.
Intricate Exclusion Restrictions: The standard assumption of strict exogeneity (errors uncorrelated with all regressors in a cluster) is often implausible. Instead, researchers face "partial exogeneity," where errors are uncorrelated only with a specific subset of regressors (e.g., current and past values in panels, or distant units in spatial/network data).

The Core Challenge:
When strict exogeneity fails, Ordinary Least Squares (OLS) suffers from asymptotic bias (an extension of the Nickell bias). Furthermore, standard inference procedures fail because:

The OLS estimator becomes a ratio of stochastic quadratic forms, not just linear forms.
The numerator of the estimation error involves complex cross-cluster dependencies induced by high-dimensional controls (e.g., multi-way fixed effects), rendering standard cluster-robust variance estimators invalid.
Weak identification can occur if the available exclusion restrictions are too weak to provide sufficient identifying variation.

2. Methodology

A. Data Structure and Assumptions

The authors model the data as $y_\ell = x_\ell\beta + w_\ell'\delta + e_\ell$ , where observations are partitioned into $N$ clusters.

Exclusion Restriction Matrix ( $E$ ): They introduce an $n \times n$ indicator matrix $E$ where $E_{\tilde{\ell}\ell} = 1$ if the researcher assumes $E[x_{\tilde{\ell}}e_\ell] = 0$ , and $0$ otherwise. This allows for flexible specification of partial exogeneity (e.g., weak exogeneity in panels, distance-based restrictions in spatial data).
Design-Based vs. Outcome-Based: The framework supports both outcome modeling (standard regression) and design-based modeling (randomized treatment assignment), showing that the bias arises similarly in both when interference exists.

B. The Estimator: Correctly Centered Internal Instrument

The authors propose a just-identified Internal Instrument IV estimator ( $\hat{\beta}_{A^*}$ ) that corrects the asymptotic bias of OLS.

Correct Centering: They define an estimator as "correctly centered" if $E[C_1(x,y)] = \beta E[C_2(x)]$ . This is a weaker condition than unbiasedness but sufficient for consistency. They prove that standard OLS is not correctly centered under partial exogeneity.
Optimization Problem: The estimator is defined as $\hat{\beta}_A = \frac{x'Ay}{x'Ax}$ $\hat{β}_{A} = \frac{x ^{'} A y}{x ^{'} A x}$ , where $A$ $A$ is an $n \times n$ $n \times n$ matrix. To ensure correct centering, $A$ $A$ must satisfy:
- Partialling-out Property (POP): $AM = A$ (where $M$ is the projection matrix orthogonal to controls $W$ ).
- Correct Centering (CC): $A_{\tilde{\ell}\ell} = 0$ whenever $E_{\tilde{\ell}\ell} = 0$ (no instrumenting with endogenous regressors).
Optimal Matrix ( $A^*$ ): Among all valid matrices, they select $A^*$ that minimizes the Frobenius distance to the identity (or $M$ ):
$A^* = \arg \min_{A \in \mathcal{A}} \|A - M\|_F$
This choice is motivated by asymptotic efficiency under homoskedasticity.
Leave-Out Interpretation: The solution $A^*$ admits a simple interpretation: for each observation $\tilde{\ell}$ , the controls are partialled out using only those observations whose errors are uncorrelated with $x_{\tilde{\ell}}$ . This results in an observation-specific "leave-out" projection. The estimator is then a standard IV regression using the original regressor as its own instrument on these transformed variables.

C. Inference and Variance Estimation

Standard cluster-robust variance estimators fail because the numerator of the estimator error ( $x'Ae$ ) is a non-trivial quadratic form involving cross-cluster terms when controls are high-dimensional.

New Central Limit Theorem (CLT): The authors derive a CLT for quadratic forms of clustered data. They show that asymptotic normality holds if the contribution of any single cluster to the total variance is negligible. This requires conditions on the operator norms of cluster covariance matrices relative to the sample size.
Jackknife Variance Estimator: They propose a jackknife variance estimator ( $\hat{V}_{JK}$ $\hat{V}_{J K}$ ) based on leaving out entire clusters.
- It is shown to be conservative (overestimates variance) in general settings due to double-counting quadratic terms and improper centering.
- It becomes unbiased when the matrix $A^*$ is block-diagonal (i.e., no cross-cluster dependence in the transformation).
Weak Identification Robustness: To handle cases where the instrument is weak (due to many controls or weak exclusion restrictions), they propose using the Anderson-Rubin (AR) test. This test inverts the AR statistic to form confidence sets that remain valid regardless of identification strength.

3. Key Contributions

Generalized Internal Instrument Framework: Extends dynamic panel methods (like Anderson-Hsiao and Arellano-Bond) to general clustered settings with arbitrary exclusion restrictions and high-dimensional controls.
Characterization of Bias: Provides a unified characterization of the asymptotic bias in clustered models (generalizing Nickell bias) and proves that OLS is inconsistent unless strict exogeneity holds.
Efficient Leave-Out Estimator: Proposes a computationally tractable, asymptotically efficient estimator that adapts to the specific exclusion structure of the application.
Valid Inference under Dependence: Develops a new CLT for clustered quadratic forms and a robust variance estimator (Jackknife) that accounts for cross-cluster dependence induced by high-dimensional fixed effects.
Identification-Robust Inference: Integrates weak identification robustness (AR tests) into the clustered IV framework.

4. Results

Theoretical:
- The proposed estimator $\hat{\beta}_{A^*}$ is consistent and asymptotically normal under mild conditions on cluster sizes and dependence structures.
- The variance of the estimator depends on the "effective sample size," captured by the trace of $A^*$ . Relaxing exclusion restrictions (allowing more endogeneity) reduces the trace, increasing standard errors.
- Standard cluster-robust errors are invalid when controls induce cross-cluster dependence (e.g., two-way fixed effects), whereas the proposed Jackknife estimator remains valid.
Empirical Application (Kenya Fiscal Intervention):
- Applied to a large-scale cash transfer experiment in rural Kenya (Egger et al., 2022) with spatial interference.
- Findings: The point estimates for the direct treatment effect on consumption were relatively stable across different distance cutoffs (1km to 3km). However, precision was highly sensitive to the exclusion restrictions.
- As the assumed radius of interference increased (relaxing exogeneity), the effective sample size decreased, leading to significantly wider confidence intervals.
- The structure of $A^*$ was found to be far from block-diagonal, confirming that standard cluster-robust errors would be inappropriate without the proposed corrections.

5. Significance

This paper provides a critical toolkit for modern empirical research where data is rarely independent and strict exogeneity is often untenable.

Practical Utility: It offers a computationally feasible solution for researchers dealing with complex dependence structures (networks, spatial spillovers, dynamic panels) and high-dimensional fixed effects, which are common in development economics and political science.
Methodological Rigor: It resolves the tension between using flexible controls (which induce bias in standard estimators) and maintaining valid inference. It demonstrates that "leave-out" strategies are not just for bias correction but are essential for valid variance estimation in these settings.
Policy Relevance: By illustrating how inference precision degrades as assumptions about interference are relaxed, the paper guides researchers in making transparent trade-offs between model robustness and statistical power.

In summary, the paper bridges the gap between dynamic panel literature and general clustered data analysis, providing a unified, robust, and efficient framework for causal inference in the presence of complex dependence and partial exogeneity.

Estimation and exclusion restrictions in clustered linear models

1. The Problem: The "Bad Neighbor" Effect

2. The Solution: The "Smart Exclusion" Rule

3. The New Tool: The "Leave-Out" Instrument

4. Why This Matters: The "Weak Signal" Problem

5. The Real-World Test: The Kenya Cash Experiment

Summary

1. Problem Statement

2. Methodology

A. Data Structure and Assumptions

B. The Estimator: Correctly Centered Internal Instrument

C. Inference and Variance Estimation

3. Key Contributions

4. Results

5. Significance

More like this

Modeling extremal dependence in multivariate and spatial problems: a practical perspective

Identifying Treatment Effect Heterogeneity with Bayesian Hierarchical Adjustable Random Partition in Adaptive Enrichment Trials

Comparative e-backtests for general risk measures

Estimating the distance at which narwhal (Monodon monoceros)(\textit{Monodon monoceros})(Monodon monoceros) respond to disturbance: a penalized threshold hidden Markov model

Either a Confidence Interval Covers, or It Doesn't (Or Does It?): A Model-Based View of Ex-Post Coverage Probability

Estimating the distance at which narwhal $(\textit{Monodon monoceros})$ respond to disturbance: a penalized threshold hidden Markov model