Covariate balancing estimation and model selection for difference-in-differences approach

Imagine you are a detective trying to figure out if a new training program actually helps people get better jobs. You have two groups of people: those who took the training (the Treatment Group) and those who didn't (the Control Group).

The classic way to solve this mystery is called Difference-in-Differences (DID). You look at how much the treatment group's income changed, subtract how much the control group's income changed, and the difference is the "magic" of the training.

But here's the catch: The two groups might be different to begin with. Maybe the treatment group was already more motivated, or had better education. If you don't account for these differences, your conclusion will be wrong.

This paper introduces two major upgrades to this detective work: a better way to balance the teams and a better way to choose your clues.

1. The "Double-Insurance" Strategy (Covariate Balancing)

The Problem:
Usually, to fix the "different groups" problem, statisticians use a tool called a Propensity Score. Think of this as a "matchmaking algorithm" that tries to pair up people in the treatment group with similar people in the control group based on their background (age, education, etc.).

However, this algorithm is only as good as the recipe you give it. If you guess the recipe wrong (model misspecification), the pairs won't match, and your detective work fails.

The Solution (Covariate Balancing for DID - CBD):
The authors propose a new method called CBD. Instead of just guessing the recipe, they force the two groups to be perfectly balanced on specific statistical "moments" (think of these as the average shape and spread of the data).

The Analogy: Imagine you are weighing two groups of people on a scale.
- Old Method: You try to guess the exact weight of everyone to make the scale balance. If you guess wrong, the scale tips.
- New Method (CBD): You don't guess. You physically add or remove weights until the scale literally balances, regardless of what you thought the weights were.

The "Double-Robust" Superpower:
The paper proves this new method has "double insurance." It will give you the correct answer if EITHER of these is true:

Your "matchmaking recipe" (propensity score) is perfect.
OR your understanding of how the training affects income is perfect.

You only need one of them to be right to get the right answer. That's why it's called Doubly Robust. It's like having two different maps to find a treasure; if one is wrong, the other still leads you to the gold.

2. The "Goldilocks" Clue Selector (Model Selection)

The Problem:
Once you have your data, you have to decide which clues (covariates) to use. Should you look at age? Education? Marital status? Hair color?

If you use too few, you miss important details.
If you use too many, you get confused by noise (like trying to solve a murder by checking if the victim liked blue socks).

Statisticians usually use a tool called AIC (Akaike Information Criterion) to pick the right number of clues. It's like a rule of thumb: "Add a penalty for every extra clue you use."

The Problem with the Old Rule:
The old rule (AIC) assumes a very specific, simple world. But in this "Difference-in-Differences" world, the math is messy because of the weighting we talked about earlier. The old rule is like using a ruler to measure a curved road; it doesn't fit, and it leads you to pick too many useless clues.

The Solution (New Information Criterion):
The authors invented a new ruler specifically for this curved road.

They derived a new formula that calculates the "penalty" for adding a clue.
The Surprise: This new penalty is much larger than the old rule. It's much stricter.
The Result: The new method is much better at ignoring useless clues (like hair color) and focusing only on the ones that actually matter (like education).

3. The Real-World Test

The authors tested their ideas on a famous dataset called LaLonde, which tracks a real job training program from the 1970s.

The Old Way: Picked almost every single clue available, resulting in a messy, confusing model.
The New Way: Picked a much smaller, cleaner set of clues.

While we don't know the "true" answer in real life, the fact that the two methods produced such different results suggests the old way was likely overcomplicating things.

Summary: Why This Matters

Think of this paper as upgrading the toolkit for social scientists and economists:

Better Balance: They gave us a way to balance treatment and control groups that doesn't break if our initial guesses are slightly off (Double Robustness).
Better Selection: They gave us a smarter way to pick which variables to use, preventing us from getting lost in a forest of unnecessary data.

In short, they made the "Difference-in-Differences" detective work more reliable, more accurate, and less likely to be fooled by bad data or bad guesses.

Here is a detailed technical summary of the paper "Covariate balancing estimation and model selection for difference-in-differences approach" by Takamichi Baba and Yoshiyuki Ninomiya.

1. Problem Statement

The Difference-in-Differences (DID) approach is a standard method for estimating the Average Treatment Effect on the Treated (ATT) under the conditional parallel trend assumption. While Semiparametric DID (SDID) methods (e.g., Abadie, 2005) improve upon basic DID by incorporating propensity scores to adjust for covariates, they face two critical challenges:

Model Misspecification Sensitivity: Standard SDID estimators rely on the correct specification of the propensity score model. If the model is misspecified, the resulting ATT estimates are biased.
Lack of Model Selection Criteria: In practice, selecting the correct set of covariates is essential for evaluating the heterogeneity of the ATT. However, no theoretically sound information criteria exist for the SDID framework. Traditional criteria (like AIC or GIC) assume specific loss functions that do not account for the random weights derived from propensity scores, leading to invalid model selection.

2. Methodology

The paper proposes two main methodological advancements: a new estimation method and a corresponding model selection criterion.

A. Covariate Balancing for DID (CBD) Estimation

The authors propose the CBD method to achieve double robustness.

Concept: Instead of estimating propensity scores via Maximum Likelihood Estimation (MLE), which requires a correctly specified parametric model, CBD estimates the propensity score parameters ( $\alpha$ ) by satisfying specific moment conditions derived from covariate balancing.
Key Innovation (Second-Order Balancing): Unlike standard covariate balancing methods that balance first-order moments (means) of covariates, the CBD method balances second-order moments (specifically $xx^T$ $x x^{T}$ ).
- The moment conditions are defined as:
  $E\left[ e^{[1]}(x; \alpha) \left\{ \frac{d^{[1]}}{e^{[1]}(x; \alpha)} - 1 \right\} xx^T \right] = 0$
  $E\left[ e^{[1]}(x; \alpha) \left\{ \frac{d^{[0]}}{e^{[0]}(x; \alpha)} - 1 \right\} xx^T \right] = 0$
Double Robustness: The resulting estimator $\hat{\theta}_{CBD}$ $\hat{θ}_{C B D}$ is consistent if either:
1. The propensity score model is correctly specified, OR
2. The outcome change model (conditional on covariates) is correctly specified (even if the propensity score model is misspecified).
Estimation: The parameters are estimated using the Generalized Method of Moments (GMM).

B. Model Selection Criterion

The authors derive a new information criterion based on the Weighted Mean Squared Risk of the SDID estimator.

Risk Function: The risk is defined based on the loss function used to derive the SDID estimator:
$R(\hat{\theta}) = \sum_{i=1}^n E[e^{[1]}(x_i) \{ \rho(d_i, x_i)\Delta_i - x_i^T \hat{\theta} \}^2]$
Bias Correction: The criterion is constructed as an asymptotically unbiased estimator of this risk. It consists of the empirical loss plus a penalty term that corrects for the bias introduced by estimating the parameters.
The Penalty Term:
- For the CBD method, the penalty term is derived as $2\text{tr}{L(\alpha^\dagger)^{-1}V(\alpha^\dagger, \theta^*)} $, where$ L $and$ V$ are matrices involving the asymptotic variance of the estimator and the moment conditions.
- Crucially, this penalty term is not simply $2 \times (\text{number of parameters})$ (as in AIC). It accounts for the complexity of the weighting scheme and the specific structure of the DID assumptions.
Comparison: The paper contrasts this with an extension of the QICW (Quasi-Information Criterion for Weighted data), showing that QICW underestimates the penalty because it fails to account for the specific variance structure of the DID estimator under the conditional parallel trend assumption.

3. Key Contributions

Double Robustness via Second-Order Balancing: The paper demonstrates that to achieve double robustness for the conditional ATT in an SDID setting, one must balance the second-order moments of the covariates, not just the first-order moments. This is a novel theoretical finding distinguishing it from standard propensity score balancing.
Theoretical Derivation of Model Selection Criteria: The authors provide the first asymptotically unbiased model selection criteria for both:
- SDID with known propensity scores.
- SDID with propensity scores estimated via MLE.
- SDID with propensity scores estimated via the proposed CBD method.
Novel Penalty Structure: They prove that the optimal penalty term for DID models differs significantly from standard AIC-type penalties, often being larger due to the variance inflation caused by the weighting mechanism.

4. Results

Simulation Studies

Robustness: In simulations where the propensity score model was misspecified (omitting a relevant covariate), the standard MLE-based SDID estimator showed significant bias. In contrast, the CBD estimator remained consistent, confirming its double robustness.
Penalty Accuracy: The proposed penalty term accurately approximated the true bias in the risk function across various sample sizes and parameter settings.
Model Selection Performance:
- The proposed criterion consistently selected models with lower empirical risk compared to the QICW.
- The QICW significantly underestimated the penalty, leading to the selection of too many covariates (high False Positive rate) and higher overall risk.
- The proposed method was particularly superior in settings with many irrelevant covariates.

Real Data Analysis (LaLonde Dataset)

The authors applied the method to the LaLonde job training dataset.
Divergent Results: The proposed criterion selected a sparse model (excluding several covariates like age, education, and race in some data blocks), whereas the QICW selected all available covariates.
Implication: The large difference in selected models highlights the importance of using a theoretically valid criterion rather than an intuitive extension of existing ones, as the choice of covariates significantly impacts the estimated treatment effects.

5. Significance

Methodological Rigor: The paper fills a critical gap in causal inference by providing a robust estimation method (CBD) and a theoretically justified tool for variable selection (the new criterion) specifically for the SDID framework.
Practical Utility: By offering a method that is robust to propensity score misspecification and a criterion that prevents overfitting, the approach enhances the reliability of causal inference in economics and epidemiology where observational data is common.
Theoretical Insight: The discovery that second-order moment balancing is required for double robustness in conditional ATT estimation challenges the conventional wisdom that first-order balancing is sufficient, offering new directions for future research in semiparametric efficiency and causal inference.

In summary, this work advances the DID methodology by integrating covariate balancing for robustness and deriving a specialized information criterion to ensure optimal model selection, thereby addressing the dual challenges of estimation bias and model uncertainty in semiparametric causal inference.

Covariate balancing estimation and model selection for difference-in-differences approach

1. The "Double-Insurance" Strategy (Covariate Balancing)

2. The "Goldilocks" Clue Selector (Model Selection)

3. The Real-World Test

Summary: Why This Matters

1. Problem Statement

2. Methodology

A. Covariate Balancing for DID (CBD) Estimation

B. Model Selection Criterion

3. Key Contributions

4. Results

Simulation Studies

Real Data Analysis (LaLonde Dataset)

5. Significance

More like this

Modeling extremal dependence in multivariate and spatial problems: a practical perspective

Identifying Treatment Effect Heterogeneity with Bayesian Hierarchical Adjustable Random Partition in Adaptive Enrichment Trials

Comparative e-backtests for general risk measures

Estimating the distance at which narwhal (Monodon monoceros)(\textit{Monodon monoceros})(Monodon monoceros) respond to disturbance: a penalized threshold hidden Markov model

Either a Confidence Interval Covers, or It Doesn't (Or Does It?): A Model-Based View of Ex-Post Coverage Probability

Estimating the distance at which narwhal $(\textit{Monodon monoceros})$ respond to disturbance: a penalized threshold hidden Markov model