Slope Consistency of Quasi-Maximum Likelihood Estimator for Binary Choice Models

Imagine you are a detective trying to figure out what makes people buy a specific brand of coffee. You have a list of clues (covariates) like income, age, and how far they live from the store. You want to know: Which clues actually matter, and in what direction? (e.g., Does higher income make them more likely to buy, or less?)

In the world of statistics, this is called a Binary Choice Model. The outcome is simple: Buy (1) or Don't Buy (0).

The Problem: The "Wrong" Map

To solve this, statisticians often use a tool called Logistic Regression. Think of this tool as a specific type of map. It assumes that the "noise" or randomness in people's decisions follows a very specific, bell-shaped curve (called a logistic distribution).

But here's the catch: Real life is messy. People's decisions might not follow that perfect curve. They might be influenced by weird factors, or the noise might look more like a flat line or a jagged mountain.

If you use the "Logistic Map" on a world that doesn't fit the map, the tool is technically inconsistent. In math-speak, this means the numbers it spits out might be wrong. It might tell you that income reverses its effect (saying rich people buy less when they actually buy more) or that the effect is zero when it's actually huge.

The Previous Theory: "Maybe it's just scaled down?"

Back in 1983, a smart economist named Ruud suggested a hopeful idea. He said, "Even if the map is wrong, maybe the direction of the clues is still right?"

He proposed that if you use this wrong map, you might not get the exact number for how much income matters, but you might get a number that is just a scaled version of the truth.

The Truth: Income increases buying probability by 5 units.
The Wrong Map: Income increases buying probability by 2.5 units.

As long as the number is positive (2.5 is still positive), the direction is correct. You know income helps. You just don't know how much it helps exactly.

However, Ruud left a gap. He didn't prove that this "scaled version" actually exists. He didn't prove that the number wouldn't accidentally turn out to be zero (meaning the clue doesn't matter at all) or negative (meaning the clue works in the opposite direction). Without that proof, you can't trust the tool.

The New Paper: Closing the Gap

This paper by Chang, Park, and Yan acts as the final piece of the puzzle. They say: "We can prove that the direction is safe, provided two specific conditions are met."

They prove that even if the underlying "noise" isn't logistic, the Logistic Regression tool will still give you the correct direction for your clues, as long as:

The "Index" Rule: The randomness depends on the combination of your clues, not on each clue individually. (Imagine the noise depends on the total score of a player, not just their height or speed separately).
The "Straight Line" Rule: The average relationship between your clues and that total score is a straight line. (If you plot the data, the average trend looks like a straight line, not a squiggly curve).

The Analogy: The Distorted Lens

Imagine you are looking at a sculpture through a funhouse mirror (the Logistic Regression tool).

The mirror distorts the size of the sculpture. A 6-foot statue might look 3 feet tall.
The Fear: What if the mirror flips the statue upside down? Or squashes it so it looks like a flat line? Then you can't tell what the statue is.
The Paper's Discovery: The authors prove that if the sculpture is built in a certain way (the "Index" and "Straight Line" rules), the funhouse mirror will never flip it upside down. It will only stretch or shrink it.
The Result: You can still tell which way the statue is facing (the slope consistency). You know the arm is pointing up, even if the mirror makes the arm look shorter.

Why Does This Matter?

This is huge for Machine Learning and Data Science.

Machine Learning loves Logistic Regression because it's fast, simple, and easy to code.
Reality: Machine Learning models often face messy data where the "perfect" statistical assumptions aren't met.
The Takeaway: This paper gives us a theoretical "green light." It says, "Hey, you can keep using Logistic Regression on messy data. Even if you don't get the exact magnitude of the effect, you can trust that the sign (positive or negative) and the relative importance of your variables are correct."

So, if your model says "Age is a positive factor," you can be confident that older people are more likely to buy, even if the model doesn't tell you the exact percentage increase. The direction is reliable.

Here is a detailed technical summary of the paper "Slope Consistency of Quasi-Maximum Likelihood Estimator for Binary Choice Models" by Chang, Park, and Yan.

1. Problem Statement

The paper addresses a fundamental issue in econometrics and machine learning: the inconsistency of the Quasi-Maximum Likelihood Estimator (QMLE) when applied to Binary Choice Models (BCMs) with misspecified error distributions.

Context: Logistic regression (and Probit) is widely used as a QMLE for BCMs, even when the true error distribution is not logistic (or normal).
The Issue: If the error distribution is misspecified, the QMLE is generally inconsistent. While Ruud (1983) showed that under certain conditions the QMLE slope vector is asymptotically proportional to the true slope vector, he did not rigorously prove that the proportionality constant is positive.
The Gap: Without proving the existence of a positive multiple, the proportionality constant could theoretically be zero (implying no effect) or negative (implying a reversed sign), rendering the estimator useless for inference. Previous literature (e.g., Li and Duan, 1989) often assumed the existence of such a solution without formal proof.

2. Methodology and Model Setup

The Model

The authors consider a standard BCM defined by:
$Y = \text{sgn}(Y^*) \quad \text{where} \quad Y^* = \alpha_0 + X'\beta_0 - U$

$Y \in \{-1, 1\}$ is the binary outcome.
$X$ is an $m$ -dimensional vector of covariates.
$U$ is the error term.
The true parameter is $\theta_0 = (\alpha_0, \beta_0')'$ .

The Estimator

The authors analyze the QMLE $\hat{\theta} = (\hat{\alpha}, \hat{\beta}')'$ which maximizes the log-likelihood assuming a specific distribution function $F$ (e.g., logistic), even if the true distribution of $U$ is different:
$Q_n(\theta) = \frac{1}{n} \sum_{i=1}^n \left[ \mathbb{1}\{Y_i = 1\} \log F(\alpha + X_i'\beta) + \mathbb{1}\{Y_i = -1\} \log (1 - F(\alpha + X_i'\beta)) \right]$

Key Assumptions

To establish slope consistency (consistency of $\hat{\beta}$ up to a positive scalar), the paper imposes two critical structural conditions alongside standard regularity conditions:

Index Dependence (Assumption 3.1): The distribution of the error term $U$ given $X$ depends on $X$ only through the index $V = \alpha_0 + X'\beta_0$ .
$L(U|X) = L(U|V)$
Linearity in Expectation (Assumption 3.2): The conditional expectation of the covariates given the index is linear.
$E(X|V) = aV + b \quad \text{for some } a, b \in \mathbb{R}^m$
Note: This holds if $X$ follows an elliptical distribution or can be achieved via reweighting.

3. Key Contributions and Technical Contributions

The paper's primary contribution is a formal proof that the First Order Conditions (FOC) of the restricted QMLE likelihood have a solution where the slope scaling factor is strictly positive.

Restricted Parameter Space: The authors restrict the parameter space to a line defined by the true parameters:
$\begin{pmatrix} \alpha \\ \beta \end{pmatrix} = c \begin{pmatrix} \alpha_0 \\ \beta_0 \end{pmatrix} + \begin{pmatrix} r \\ 0 \end{pmatrix}$
Here, $c$ represents the scaling factor for the slope, and $r$ is an intercept shift.
Reduction of FOC: Under the Index Dependence and Linearity in Expectation assumptions, the $(m+1)$ -dimensional FOC system reduces to a system of two equations in $(c, r)$ .
The Core Lemma (Lemma 3.2): The authors prove that under the stated assumptions, the equation $\dot{Q}(c, r) = 0$ $\dot{Q} (c, r) = 0$ necessarily has a solution $(c^*, r^*)$ $(c^{*}, r^{*})$ such that $c^* > 0$ .
- This closes the theoretical gap left by Ruud (1983), who assumed such a solution existed but did not prove the positivity of $c^*$ .
- This ensures that the estimated slope $\hat{\beta}$ converges to $c^*\beta_0$ with the correct sign.

4. Main Results

Theorem 3.3 (Slope Consistency):
Under Assumptions 2.1–2.4 (regularity/identification) and 3.1–3.2 (index dependence/linearity):

The population QMLE likelihood has a unique maximizer in the restricted space with a positive scaling factor $c^* > 0$ .
The QMLE estimators converge in probability:
$\hat{\alpha} \xrightarrow{p} c^*\alpha_0 + r^*$
$\hat{\beta} \xrightarrow{p} c^*\beta_0$
Inference: Since $\beta^* = c^*\beta_0$ $β^{*} = c^{*} β_{0}$ , standard QMLE inference (using robust/sandwich variance) can be applied to test scale-invariant hypotheses about the true parameters, such as:
- $\beta_{j,0} = 0$ (significance of a variable).
- $\beta_{j,0} = \beta_{k,0}$ (relative magnitude of coefficients).

5. Significance and Implications

Theoretical Justification for Logistic Regression: The paper provides rigorous theoretical backing for the widespread empirical use of logistic regression (and Probit) in machine learning and econometrics, even when the true error distribution is unknown or non-logistic. It confirms that these models yield consistent estimates of the direction and relative magnitude of covariate effects.
Resolution of the "Sign Reversal" Risk: By proving $c^* > 0$ , the authors eliminate the theoretical possibility that a misspecified model would yield a coefficient with the wrong sign (reversed effect), a concern that previously lacked a formal guarantee.
Practical Applicability: The conditions required (Index Dependence and Linearity in Expectation) are satisfied in many practical scenarios:
- When covariates are elliptically distributed (e.g., multivariate normal).
- When researchers use reweighting techniques (as suggested by Ruud, 1986) to force the data to satisfy the linearity condition.
Scope: The results apply to both Logit and Probit models, as both satisfy the necessary concavity and differentiability conditions assumed in the paper.

Conclusion

Chang, Park, and Yan successfully bridge a long-standing theoretical gap in the literature on semi-parametric binary choice models. They demonstrate that while the QMLE does not recover the exact magnitude of the true slope without correct distributional specification, it does consistently recover the slope up to a positive scalar, provided the error structure depends on covariates only through the index and the covariates satisfy a linearity-in-expectation condition. This validates the use of logistic regression as a robust tool for analyzing binary outcomes in high-dimensional and misspecified settings.