On the Rates of Convergence of Induced Ordered Statistics and their Applications

Imagine you are a chef trying to recreate a specific, perfect soup recipe (let's call it the "Target Flavor") that only exists when the temperature is exactly 70°C.

In the real world, you can't perfectly control the temperature to be exactly 70°C. You can only get close. So, you decide to taste a few bowls of soup that were made at 69.5°C, 69.8°C, 70.1°C, and 70.2°C. You mix these tastes together to guess what the 70°C soup would taste like.

This is essentially what Induced Order Statistics (IOS) are. In statistics, instead of soup and temperature, we have data points (like income and education) and a specific value we care about (like a specific age or a policy cutoff). We look at the data points closest to that value to guess what the "perfect" data at that value looks like.

The Problem: How Close is "Close Enough"?

The paper by Bugni, Canay, and Kim asks a critical question: How many of these "closest" neighbors do we need to use to get a good guess, and how does the "smoothness" of our data affect this?

In the past, statisticians had a very strict rulebook (like the one by Falk et al., 2010). They said, "To get a perfect guess, the data must be incredibly smooth, like silk, and you can't be near the edge of the table."

Why was this a problem?

Real data isn't silk: Real-world data is often bumpy, jagged, or messy.
The Edge Problem: Many important statistical problems happen at the "edge" of the data. For example, in a Regression Discontinuity Design (RDD)—a method used to see if a policy works—you only care about people just above and just below a specific cutoff (like a test score of 50). The cutoff is the "edge" of the table. The old rules said, "You can't analyze the edge!" which made them useless for these popular methods.

The New Solution: A More Flexible Toolkit

The authors of this paper built a new, more flexible toolkit. They didn't demand that the data be perfect silk; they just asked that it be "reasonably smooth" (mathematically, they use a condition called Quadratic Mean Differentiability).

Here is what they discovered, using some simple analogies:

1. The "Neighbor Count" Rule (The $k$ vs. $n$ Trade-off)

Imagine you have a huge crowd of people ( $n$ ) and you want to guess the average height of people standing exactly at a specific spot. You decide to look at the $k$ people standing closest to that spot.

Too few neighbors ( $k$ is small): Your guess is shaky because you don't have enough data.
Too many neighbors ( $k$ is huge): You start including people who are actually far away from the spot, and their data "pollutes" your guess.

The authors figured out the Goldilocks Zone. They gave a precise formula for how big $k$ can grow as your total crowd $n$ gets bigger.

The Rule: If your data is in 1 dimension (like a line), you can pick about $n^{2/3}$ neighbors. If you pick more than that, your guess starts to get worse.
Why it matters: Previous methods often assumed $k$ stayed small and fixed. This paper tells us we can actually use more data points as our sample size grows, making our estimates more accurate, provided we follow their new growth rule.

2. The "Smoothness" Meter

The paper explains that the "smoothness" of your data determines how fast your guess gets better.

Smooth Data (Silk): If the data changes gently, your guess improves quickly.
Rough Data (Sandpaper): If the data is jagged, your guess improves slowly.
The Edge: The authors showed that even at the "edge" of the data (like the 50-point cutoff), you can still get a good guess, as long as the data doesn't change too violently right at the edge.

3. Two Ways to Measure "Badness"

The authors used two different rulers to measure how far off their guess is:

The Total Variation Ruler: This measures if the entire shape of the distribution is wrong.
The Hellinger Ruler: This is a slightly different way of measuring the "distance" between the guess and the truth.

They found that sometimes these two rulers disagree. One might say your guess is "pretty good" while the other says "it's okay." Their math shows exactly when and why this happens, which helps statisticians choose the right ruler for their specific job.

Real-World Impact: Why Should You Care?

This paper isn't just about soup; it fixes the engine for several important tools used in economics and policy:

Policy Testing (Regression Discontinuity): When governments change a law at a specific cutoff (e.g., "If you score 60, you get a scholarship"), researchers use the people just above and below 60 to see if the law works. This paper tells them exactly how many people to include in their study to get a valid result, even if the data is messy.
Machine Learning (k-Nearest Neighbors): This is a common algorithm that predicts outcomes based on similar past cases. The paper helps tune this algorithm so it doesn't overfit (look at too many irrelevant neighbors) or underfit (look at too few).
Robust Optimization: When making decisions under uncertainty (like investing money), this paper helps ensure that the "worst-case scenarios" you prepare for are based on reliable data approximations.

The Bottom Line

Before this paper, statisticians had to be very careful and often limited in how they analyzed data near specific points, especially at the edges. They had to assume the data was perfectly smooth.

This paper says: "You don't need perfect data. You just need reasonable data. And if you follow our new rules on how many neighbors to pick, you can get accurate, reliable results even at the very edges of your data."

It's like upgrading from a rigid, fragile ruler to a flexible, stretchy tape measure that works on smooth surfaces, bumpy surfaces, and even the very edges of the table.

Here is a detailed technical summary of the paper "On the Rates of Convergence of Induced Ordered Statistics and their Applications" by Bugni, Canay, and Kim.

1. Problem Statement

The paper addresses the asymptotic behavior of Induced Order Statistics (IOS). IOS arise when a sample of pairs $(X_i, Y_i)$ is reordered based on the proximity of the auxiliary variable $X_i$ to a specific target point $x_0$ , and the corresponding responses $Y_i$ are analyzed in this induced order.

Context: IOS are fundamental in econometric and statistical methods such as Regression Discontinuity Designs (RDD), $k$ -nearest-neighbor (k-NN) estimation, and distributionally robust optimization.
The Gap: Existing literature (notably Falk, Hüsler, and Reiss, 2010) provides convergence rates for IOS vectors only under very restrictive smoothness assumptions. These assumptions typically require $x_0$ to be an interior point of the support and impose a specific local exponential-family structure on the joint density.
The Challenge: Many practical applications (especially RDD) involve boundary points (e.g., a cutoff where the running variable is restricted to one side). Furthermore, the existing assumptions are often too strong for general data-generating processes. The paper seeks to derive general convergence rates under primitive and weaker conditions that accommodate both interior and boundary points, specifically analyzing the discrepancy between the IOS vector $S_n$ and an ideal i.i.d. benchmark $S$ drawn from the true conditional distribution $P$ .

2. Methodology and Framework

2.1 Setup

Data: An i.i.d. sample $\{(X_i, Y_i)\}_{i=1}^n$ from a joint distribution $Q$ with density $f(x,y)$ .
Target: The conditional law $P$ of $Y$ given $X=x_0$ .
IOS Vector ( $S_n$ ): The vector of $k$ responses $Y$ corresponding to the $k$ observations of $X$ closest to $x_0$ .
Benchmark ( $S$ ): An ideal vector of $k$ i.i.d. draws from $P$ .
Metrics: The discrepancy is measured using Hellinger distance ( $H$ ) and Total Variation distance ( $TV$ ). These metrics control the error of hypothesis tests and estimators based on $S_n$ relative to the infeasible $S$ .

2.2 Two-Step Analytical Approach

The authors decompose the problem into two distinct steps:

High-Level Result (Marginal to Joint): They establish a general theorem linking the marginal approximation error of the conditional law (approximating $P$ by $P_r$ , the law of $Y$ given $X \in B_r(x_0)$ ) to the joint convergence rate of the IOS vector.
Primitive Conditions: They derive specific rates for the marginal error under Quadratic Mean Differentiability (QMD) of the conditional densities, a standard condition in asymptotic statistics that is weaker than the assumptions in prior literature.

2.3 Key Assumptions

Assumption 2 (Local Regularity): Requires the marginal density of $X$ to be Lipschitz continuous at $x_0$ and the support of $X$ to have "local thickness" (non-vanishing volume) near $x_0$ . Crucially, this allows $x_0$ to be a boundary point.
Assumption 3 (QMD): The conditional density $p_x(y)$ is quadratically mean differentiable at $x_0$ . This implies the existence of a score function $\dot{\ell}_{x_0}$ such that the $L_2$ distance between $\sqrt{p_{x_0+t}}$ and $\sqrt{p_{x_0}}$ behaves linearly in $t$ .

3. Key Contributions and Results

3.1 General Convergence Rates (Theorem 2)

The paper derives the joint convergence rates for the IOS vector $S_n$ based on marginal rates $H(P_r, P) = O(r^{a_h})$ and $TV(P_r, P) = O(r^{a_{tv}})$ .

Hellinger Rate: $H(L(S_n), L(S)) = O(k^{1/2} (k/n)^{a_h/d})$ .
Total Variation Rate: $TV(L(S_n), L(S)) = O(\min\{k(k/n)^{a_{tv}/d}, k^{1/2}(k/n)^{a_h/d}\})$ .

Significance: This result isolates the impact of smoothness. It shows that the joint TV rate is constrained by the "bottleneck" of the Hellinger rate unless the marginal TV rate is exceptionally fast.

3.2 Rates under QMD (Theorem 3)

Under the standard QMD assumption (Assumption 3):

Marginal Rates: Both Hellinger and TV marginal distances converge at rate $O(r)$ . That is, $a_h = 1$ and $a_{tv} = 1$ .
Joint Rates: Substituting $a_h=1$ into Theorem 2 yields:
$H(L(S_n), L(S)) = O(k^{1/2} (k/n)^{1/d})$
$TV(L(S_n), L(S)) = O(k^{1/2} (k/n)^{1/d})$
Sharpness: The paper proves these rates are sharp.
- At boundary points, the $O(r)$ rate is strictly necessary.
- At interior points, while specific models might yield faster rates, no uniform polynomial improvement (e.g., $O(r^{1+\epsilon})$ ) is possible across the class of QMD models.

3.3 Growth Conditions on $k$

For the IOS vector to converge to the ideal benchmark (i.e., for the error to vanish), the number of neighbors $k$ must grow with $n$ at a specific rate:
$k = o(n^{2/(2+d)})$

For $d=1$ (common in RDD), this implies $k = o(n^{2/3})$ .
This contrasts with the faster rate $k = o(n^{4/(4+d)})$ implied by the restrictive assumptions of Falk et al. (2010), highlighting the cost of relaxing smoothness assumptions.

3.4 Comparison with Falk et al. (2010)

The authors rigorously compare their conditions with the benchmark result of Falk, Hüsler, and Reiss (FHR).

FHR Assumption: Implies a local exponential tilt structure $f(x_0+t, y) \approx f(x_0, y) \exp(t \zeta(y))$ . This forces $x_0$ to be an interior point and the support of $Y$ to be invariant locally.
Result: FHR yields a marginal rate of $O(r^2)$ , leading to faster joint convergence.
Trade-off: The paper demonstrates that the $O(r^2)$ rate is an artifact of the strong exponential-family structure. Under the more general QMD (which allows boundary points and varying supports), the rate degrades to $O(r)$ , which is the best possible uniform rate.

4. Applications

The paper applies these theoretical results to three major areas:

Regression Discontinuity Designs (RDD):
- Revisits the permutation test by Canay and Kamat (2018).
- Correction: The original paper suggested a rule of thumb for $k$ (or $q$ ) growing as $n^{0.9}$ . The authors show this is invalid under QMD.
- New Rule: To ensure asymptotic validity, the number of neighbors must satisfy $q = o(n^{2/3})$ . This provides a formally valid growth condition for RDD tests using IOS.
$k$ -Nearest-Neighbor Estimators:
- Establishes conditions under which IOS-based estimators (e.g., conditional means, quantiles) are asymptotically normal.
- The normal approximation holds provided $k = o(n^{2/(d+2)})$ , linking the estimator's bias-variance trade-off directly to the smoothness of the conditional density.
Distributionally Robust Optimization (DRO):
- Analyzes the work of Esteban-Pérez and Morales (2022), which uses IOS to approximate conditional laws for robust decision-making.
- The paper clarifies the required decay rate for the robustness radius $\rho_n$ . Under QMD, the feasibility condition scales as $\rho_n \gtrsim \alpha_n^{1/m}$ , preserving the scaling of classical frameworks but allowing for much weaker structural assumptions (including boundary points).

5. Significance and Conclusion

Unified Toolkit: The paper provides a reusable framework for analyzing tests and estimators based on induced order statistics, applicable across a broad range of econometric problems.
Boundary Point Handling: It is the first to provide sharp convergence rates for IOS that explicitly accommodate boundary points, which are central to RDD and other policy-relevant settings.
Clarification of Smoothness Trade-offs: By distinguishing between marginal and joint rates and comparing QMD with stronger exponential-family assumptions, the authors clarify exactly how smoothness assumptions dictate the speed of convergence and the admissible growth of $k$ .
Practical Guidance: The results offer concrete, theoretically grounded guidelines for selecting the number of neighbors ( $k$ ) in practice, correcting previous heuristics that were inconsistent with the underlying asymptotic theory for general data-generating processes.

In summary, this paper bridges a critical gap between the theoretical requirements of induced order statistics and the practical realities of econometric applications, offering a robust, flexible, and mathematically rigorous foundation for inference in local conditional settings.