Second order asymptotics for the number of times an estimator is more than epsilon from its target value

This paper investigates second-order asymptotics for the number of times a strongly consistent estimator deviates from its target by more than ε\varepsilon, introducing a concept of "asymptotic relative deficiency" to distinguish between estimators with identical first-order efficiency and demonstrating that specific finite-sample corrections (such as using n1/3n-1/3 for normal variance) minimize the expected number of such errors.

Nils Lid Hjort, Grete Fenstad

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are a coach training a team of runners (estimators) to find a hidden treasure (the true parameter, θ\theta) in a vast, foggy field.

Your goal isn't just to see who finds the treasure eventually (that's what standard statistics usually checks). Your goal is to count how many times each runner steps outside a small, safe circle of radius ϵ\epsilon around the treasure before they finally settle down.

Let's call this count the "Miss Count" (QϵQ_\epsilon).

The Problem: The Tie

In the world of statistics, we often have two runners who are equally good. They both eventually find the treasure, and if you look at their long-term average speed, they are identical. Standard statistics says, "Great, they are tied. Pick either one."

But the authors of this paper, Nils Lid Hjort and Grete Fenstad, ask: "Wait a minute. If they are tied in speed, who stumbles more often while running?"

They want to know: Between two equally fast runners, which one makes fewer mistakes (steps outside the safe circle) along the way?

The First Order vs. The Second Order

  • First Order (The Old Way): This looks at the runners' average speed. If Runner A and Runner B both average 10 mph, the old method says they are equal.
  • Second Order (The New Way): This looks at the friction. Even if they have the same average speed, maybe Runner A stumbles a lot but recovers quickly, while Runner B glides smoothly. The paper develops a new way to measure this "stumbling" to break the tie.

The Analogy: The "Miss Count"

Imagine the treasure is a bullseye.

  • ϵ\epsilon (Epsilon): This is the size of the bullseye. It's very small.
  • QϵQ_\epsilon: This is the total number of times a runner's foot lands outside that bullseye as they run their race (as the sample size nn grows).

The paper proves that if you shrink the bullseye (ϵ\epsilon) to be microscopic, the total number of misses (QϵQ_\epsilon) becomes huge. However, if you multiply the misses by the size of the bullseye squared (ϵ2×Qϵ\epsilon^2 \times Q_\epsilon), you get a stable number. This number tells you how "wobbly" the runner is.

The Big Discovery: The "Perfect" Denominator

The authors apply this "Miss Count" theory to some classic statistics problems. They found that the formulas we use in textbooks aren't always the best at minimizing these "misses."

Here are their surprising findings, translated into everyday terms:

1. The Variance Problem (Measuring Spread)
When calculating how spread out a set of numbers is (variance), we usually divide by NN (the total count) or N1N-1.

  • The Old Belief: N1N-1 is the "unbiased" choice. NN is the "maximum likelihood" choice.
  • The Paper's Verdict: Neither is the best at minimizing "misses."
  • The Winner: You should divide by N1/3N - 1/3.
    • Analogy: Imagine you are baking a cake. The recipe says "add 1 cup of flour." But if you want the cake to be perfectly stable (fewest errors), you actually need to add a tiny bit less than 1 cup. The paper says the "magic number" is $1/3$ of a cup less than the standard correction.

2. The Exponential Mean
When measuring the average time until an event happens (like a lightbulb burning out):

  • The Winner: A specific adjustment where you divide by N+1/3N + 1/3 (conceptually).
  • The Result: The standard "Maximum Likelihood" method (which divides by NN) actually makes 1/9 more errors than the optimized method.

3. The Squared Mean
When estimating the square of an average (like estimating the power of a signal):

  • The Winner: A specific adjustment where you add a small correction term.
  • The Result: The standard method underestimates the error, while the optimized method makes the fewest "misses."

Why Does This Matter?

You might ask, "Who cares about $1/3$ of a denominator? It's a tiny difference!"

The authors argue that in the real world, we often have to choose between two methods that look identical on paper. This "Second Order" analysis is the tie-breaker. It tells us:

  • "Method A and Method B are both good."
  • "But Method A will make you step outside the safety zone slightly more often than Method B."
  • "Therefore, if you want the smoothest ride with the fewest stumbles, pick Method B."

The "Brownian Motion" Connection

The paper gets a bit technical at the end, mentioning "Brownian motion" (the random jitter of particles in a fluid).

  • The Metaphor: Imagine the runners aren't just running on a track, but are actually tiny particles jittering in a fluid. The "Miss Count" is related to how much time these particles spend touching the walls of their container.
  • The authors show that the difference between two estimators behaves like the difference in time two jittery particles spend near the walls. This connects their statistical findings to deep physics-like laws of randomness.

Summary

This paper is about fine-tuning.
Just as a master chef knows that a pinch of salt makes a dish perfect, while a standard recipe might be "good enough," these statisticians found the "pinch of salt" (the 1/3-1/3 adjustment) that makes statistical estimators make the fewest possible mistakes as they get more data.

They didn't just find a new way to run the race; they found the exact stride length that prevents you from tripping over your own feet.