Representativeness and Efficiency in Overidentified IV

Imagine you are a detective trying to figure out the true effect of a new medicine. You can't just give it to everyone and see what happens (that would be unethical or impossible), so you have to use "clues" (called instruments) to guess who took the medicine and who didn't.

In this paper, the authors, Chun Pang Chow and Hiroyuki Kasahara, are tackling a problem that happens when you have too many clues (multiple instruments) and the medicine works differently for different people (heterogeneous treatment effects).

Here is the story of their discovery, explained simply.

1. The Problem: The "Efficient" Detective Goes Wrong

In the old days, statisticians had a favorite tool called GMM (Generalized Method of Moments). Think of GMM as a super-smart, high-speed calculator that tries to combine all your clues into one single answer as quickly and precisely as possible. It's "efficient," meaning it gives you the tightest possible answer with the least amount of noise.

But here's the catch:
When the medicine works differently for different people (some get cured, some get a headache, some feel nothing), this super-smart calculator starts making weird choices to get its "efficient" answer.

The "Heterogeneity Penalty": To minimize noise, the calculator starts ignoring the clues that are "messy" (where the results vary a lot).
The "Negative Weight" Trap: Even worse, to make the math work, the calculator sometimes decides to subtract the evidence from certain clues. It's like a detective saying, "Because this witness saw a red car, I'm going to subtract 10 points from the suspect's guilt score."

The Result: The final answer the calculator gives you isn't just "precise"; it's actually answering a different question than the one you asked. It might tell you the effect of the medicine on the most stable group of people, while completely ignoring the people who actually need the medicine most.

2. The Impossible Dream

The authors prove a frustrating mathematical truth: You can't have it all.
You cannot have a method that is:

Efficient (gives the most precise answer), AND
Interpretable (tells you exactly what group of people the answer applies to, without subtracting evidence).

If you force the calculator to be efficient, it distorts the answer. If you force it to answer your specific question, it becomes less efficient. It's a trade-off.

3. The Solution: "Representative Targeting" (RT)

The authors introduce a new tool called Representative Targeting (RT).

The Analogy: The Jury vs. The Super-Computer

Old GMM (The Super-Computer): Tries to solve a giant equation where all clues must fit one single, perfect story. If a clue doesn't fit perfectly, it gets twisted or discarded to make the math work.
New RT (The Jury): Treats every clue as a separate witness.
1. It asks each clue (instrument) for its own specific answer (a "Wald estimator").
2. It then asks the researcher: "Who do you want to represent? Do you want an average of everyone? Do you want to focus on the poor? The rich?"
3. The researcher picks the weights (e.g., "Give everyone an equal vote").
4. RT simply averages the answers from the witnesses based on those weights.

Why is this better?

No Negative Weights: It never subtracts evidence. It's like a jury that only adds votes, never subtracts them.
Causal Clarity: If you tell it, "I want the average effect for everyone," it gives you exactly that. No hidden tricks.
Still Efficient: Surprisingly, even though it's a simple average, it turns out to be the most precise way to answer that specific question. It achieves the "efficiency" of the old method without the "distortion."

4. Real-World Examples from the Paper

Example A: The Classroom Experiment (STAR)

The Setup: A famous experiment where kids were put in small classes vs. regular classes in 78 different schools.
The Old Way (GMM): The super-smart calculator looked at the schools where the results were very "noisy" (kids had very different scores) and decided to ignore them or downplay them to get a clean number. It concluded small classes helped by 6.5 points.
The New Way (RT): The authors asked for a simple average of all schools. Because they didn't let the calculator ignore the "noisy" schools, the answer was 8.8 points.
The Lesson: The "efficient" method was actually hiding the true benefit of small classes in the schools that needed it most.

Example B: Patent Examiners

The Setup: Patent examiners are like judges for inventions. Some are strict, some are lenient. The authors used this to see if getting a patent helps a company get more funding later.
The Old Way (GMM): The calculator got so confused by the differences between examiners that it gave a negative weight to the most lenient examiners. It concluded patents added 5.5 citations.
The New Way (RT): By using their new method to target a specific policy question ("What happens if we make all examiners slightly more lenient?"), they found the answer was actually 11.75 citations.
The Lesson: The old method was almost half off because it was mathematically "efficient" but causally wrong.

The Big Takeaway

When you have multiple ways to measure a cause-and-effect relationship, don't just let the computer pick the "best" math. The "best" math might be answering a question you didn't ask.

The authors' new method, Representative Targeting, lets the researcher say, "I want to know the effect on this specific group," and guarantees that the answer is both honest (no negative weights) and precise. It turns the "estimator" (the tool) back into a servant of the "estimand" (the question), rather than letting the tool decide the question for you.

1. Problem Statement

The paper addresses a fundamental conflict in econometrics when using Instrumental Variables (IV) with heterogeneous treatment effects and multiple instruments (overidentified models).

The "Estimator Determines Estimand" Phenomenon: In classical linear models, the choice of estimator (e.g., OLS vs. GLS) affects precision but not the parameter being estimated. However, in IV models with heterogeneous treatment effects, the Generalized Method of Moments (GMM) weighting matrix dictates which subpopulation's treatment effect is estimated.
The Trade-off: Researchers face a dilemma between statistical efficiency and causal interpretability:
- Efficient GMM (EGMM): Minimizes variance but embeds a "heterogeneity penalty." It downweights instruments with high treatment effect dispersion and, crucially, frequently assigns negative weights to certain Wald estimators. This results in an estimand that is a weighted average of Local Average Treatment Effects (LATEs) where some weights are negative, rendering the result causally uninterpretable (it does not represent a valid average treatment effect for any subpopulation).
- Researcher-Specified Targets: Researchers often desire specific weights (e.g., equal weighting or weights proportional to compliance share) to answer specific policy questions.
The Impossibility Result: The authors prove that within the standard GMM class, it is impossible to simultaneously achieve the semiparametric efficiency bound and deliver researcher-specified weights (unless all instrument-specific Wald estimands are identical). Any GMM estimator targeting a specific set of weights must rely on a misspecified "common residual," leading to suboptimal variance.

2. Methodology and Framework

A. Theoretical Framework

The authors extend the Imbens and Angrist (1994) LATE framework to $L$ binary instruments.

Compliance Types: Individuals are categorized into $2^L$ potential compliance types based on how they respond to every combination of instruments.
Wald Decomposition: Each instrument-specific Wald estimator is decomposed into a weighted sum of type-specific LATEs.
The Negative Weight Problem: When instruments are correlated, the weights in the GMM estimand can become negative.
Positive Regression Dependence (PRD): The authors introduce PRD (Lehmann, 1966) as a sufficient condition on the joint distribution of instruments to guarantee non-negative weights for individual Wald estimators. PRD holds in many quasi-experimental designs (e.g., independent randomization, cumulative threshold designs like judge/examiner leniency).

B. The GMM Analysis

Heterogeneity Penalty: The paper characterizes the EGMM weighting matrix. It shows that EGMM minimizes variance by downweighting instruments where the "common residual" (fitting a single $\beta$ to all instruments) is large. This occurs when treatment effects are highly dispersed for the compliers of that instrument.
Impossibility Theorem: The authors prove that for any target weight vector $\omega$ (where $\omega \neq \lambda_{EGMM}$ ), the weighting matrix required to achieve the efficiency bound ( $\Omega^{-1}$ ) will produce implied weights different from $\omega$ . Conversely, forcing the weights to be $\omega$ results in a variance strictly higher than the efficiency bound.

C. The Proposed Solution: Representative Targeting (RT)

To resolve the trade-off, the authors propose the Representative Targeting (RT) estimator.

Mechanism: Instead of fitting a single common residual to all moment conditions (as GMM does), RT computes each instrument-specific Wald ratio separately and then averages them using researcher-specified weights ( $\omega$ ).
Causal Validity: Under PRD, RT ensures that the final estimand is a convex combination (non-negative weights) of type-specific treatment effects, guaranteeing causal interpretability.
Efficiency: RT achieves the semiparametric efficiency bound for its specific targeted estimand. It uses instrument-specific residuals rather than a common residual, avoiding the "misspecification" penalty inherent in GMM.
Variance Frontier: The authors define a "Variance Frontier" representing the minimum possible variance for a given estimand. RT sits on this frontier for its specific target weights, whereas GMM estimators (even those targeting the same weights) sit above the frontier due to the common-residual constraint.

D. Marginal Treatment Effect (MTE) Representation

The paper connects the discrete instrument framework to the continuous MTE framework (Heckman and Vytlacil, 2005).

Weight Functions: GMM and RT can be viewed as integrating the MTE curve against different weight functions.
Policy-Relevant Treatment Effect (PRTE): The authors show how RT can target the PRTE (the effect of a specific policy change) by finding the weight vector $\omega$ that minimizes the $L_2$ distance between the RT weight function and the policy weight function. This provides a point estimate that is the "closest feasible approximation" to the PRTE.

3. Key Contributions

Characterization of the Heterogeneity Penalty: The paper provides a precise mathematical characterization of how EGMM distorts weights. It demonstrates that EGMM actively penalizes instruments with high treatment effect dispersion, often leading to negative weights and estimands that fall outside the convex hull of valid LATEs.
Impossibility Theorem for GMM: It formally proves that the GMM class cannot simultaneously satisfy efficiency and researcher-specified weighting constraints under heterogeneity.
Development of the RT Estimator: RT is introduced as a constructive solution that leaves the GMM class. It offers a semiparametrically efficient estimator for any weighted average of Wald estimands while guaranteeing non-negative weights under PRD.
Variance Frontier Analysis: The paper defines the absolute minimum variance achievable for a given estimand, showing that RT achieves this bound, while constrained GMM estimators incur a "weight-composition cost."
PRTE Targeting: It operationalizes the targeting of Policy-Relevant Treatment Effects in discrete instrument settings, providing a variance-optimal point estimate that complements existing partial identification bounds.

4. Empirical Results

The authors apply their methods to two distinct datasets to demonstrate the magnitude of the problem and the solution.

A. Tennessee STAR Class-Size Experiment

Setup: 78 schools with independent randomization of class sizes ( $L=78$ instruments).
Findings:
- There is substantial heterogeneity in school-specific treatment effects (ranging from -76 to +73 points).
- The J-statistic decisively rejects the null of homogeneous effects.
- 2SLS Estimate: 8.84 points.
- EGMM Estimate: 6.55 points (a 25% reduction).
- Mechanism: EGMM downweights schools with high treatment effect dispersion (which happen to be the schools with the largest effects), shifting the estimand toward schools with moderate effects.
- RT Performance: The Equal-Weight ATE (EW-ATE) and Complier-Share-Weighted ATE (CSW-ATE) via RT recover estimates closer to 2SLS but with valid causal interpretation and lower variance than the constrained GMM alternative.

B. Patent Examiner Leniency Design

Setup: Analysis of patent applications where examiners are grouped by leniency ( $L=6$ cumulative instruments).
Findings:
- 2SLS Estimate: 10.58 forward citations.
- EGMM Estimate: 5.51 citations (nearly half of 2SLS).
- Mechanism: EGMM assigns 86% of its weight to the lowest leniency threshold and negative weights to the highest thresholds ( $G \ge 5, G \ge 6$ ). This pulls the estimate below every individual Wald estimate, making it causally uninterpretable.
- RT Performance:
  - CSW-ATE: 12.87 citations.
  - EW-ATE: 13.75 citations.
  - PRTE-Targeted RT: 11.75 citations.
- The RT estimands are all within the range of individual Wald estimates and represent valid weighted averages. The PRTE-targeted RT provides a policy-relevant estimate with a negligible identification gap (< 0.03 citations).

5. Significance and Implications

Re-evaluating Standard Practice: The paper suggests that standard Efficient GMM (EGMM) and 2SLS estimates in overidentified settings with heterogeneous effects may be severely biased toward uninterpretable parameters due to negative weighting and the heterogeneity penalty.
Methodological Shift: It advocates for moving away from the "common residual" GMM architecture when the goal is causal interpretation. The "simple" weighted average of Wald estimators (RT) is shown to be the semiparametrically optimal approach for targeted parameters.
Policy Relevance: By enabling researchers to target specific estimands (like the PRTE) with known variance properties and guaranteed non-negative weights, the paper provides a robust tool for policy evaluation in quasi-experimental settings.
Diagnostic Tool: The J-test is re-interpreted not just as a test of instrument validity, but as a diagnostic for treatment effect heterogeneity. Rejection implies that different instruments are identifying different subpopulations, necessitating a choice of weighting scheme rather than a default to EGMM.

In summary, Chow and Kasahara demonstrate that efficiency and representativeness are mutually exclusive in standard GMM under heterogeneity. They resolve this by proposing Representative Targeting (RT), a method that sacrifices the GMM framework to gain causal interpretability and semiparametric efficiency for researcher-specified targets.