Approximations for the number of maxima and near-maxima in independent data

Imagine you are hosting a massive talent show with $n$ contestants. You want to know two things:

The "Record Breakers": How many people tied for the absolute best score? (Maybe 3 people got 100/100).
The "Near Misses": How many people scored very close to the best score? (Maybe 10 people got between 98 and 100).

This paper is a mathematical detective story about predicting these numbers. The author, Fraser Daly, wants to answer a simple question: "If I have a huge crowd of random data, can I use a simple, well-known math formula to guess how many 'winners' or 'near-winners' there will be, and how wrong might my guess be?"

Here is the breakdown of the paper's journey, using everyday analogies.

1. The Two Scenarios: Discrete vs. Continuous

The paper splits the problem into two worlds, like two different types of games.

World A: The "Ladder" Game (Discrete Data)

Imagine a ladder where you can only stand on specific rungs (1, 2, 3, 4...). You can't stand between rungs.

The Problem: If you have 1,000 people climbing this ladder, how many are standing on the very top rung?
The Surprise: In the past, mathematicians knew that if the people are climbing a "Geometric Ladder" (where it's harder to climb higher), the number of people on top behaves strangely. Sometimes it looks like a Logarithmic pattern (a specific curve), and sometimes it looks like a Poisson pattern (like counting raindrops).
The Paper's Contribution: Previous studies said, "Hey, it looks like a Logarithmic distribution!" But they didn't say how close the guess was.
- Daly's Fix: He built a "ruler" (called Total Variation Distance) to measure exactly how far off the guess is. He proved that for certain types of ladders, the "Logarithmic" guess is incredibly accurate, and he gave a formula for the maximum possible error.

World B: The "Ramp" Game (Continuous Data)

Now imagine a smooth ramp. People can stand anywhere (1.5, 1.5001, 1.5000002...).

The Problem: Instead of asking who is exactly on top (which is impossible because no two people will land on the exact same millimeter), we ask: "How many people are within 1 inch of the top?"
The Paper's Contribution: He shows that the number of people in this "1-inch zone" can be predicted using a Negative Binomial distribution (a fancy way of saying "counting successes until a certain number of failures"). Again, he didn't just say "it works"; he built a ruler to measure the error.

2. The Magic Tool: Stein's Method (The "Shadow Puppet" Trick)

How did the author measure the error? He used a technique called Stein's Method.

The Analogy:
Imagine you are trying to guess the shape of a mysterious object in a dark room. You have a perfect "Shadow Puppet" of what the object should look like (the Logarithmic or Negative Binomial distribution).

The Old Way: You shine a light and squint. "Yeah, it looks kind of like the shadow." (This is what previous papers did).
Stein's Method: You set up a specific test. You ask the mysterious object: "If I poke you here, how do you react?"
- If the object is truly the shadow puppet, it reacts in a very specific, predictable way.
- If the object is slightly different, it reacts differently.
- By measuring the difference in reaction, you can calculate the exact distance between your mystery object and the perfect shadow.

In this paper, the author had to invent a new set of rules for the "Logarithmic Shadow Puppet" because nobody had ever used this specific tool for that shape before. It was like inventing a new language to describe a new type of animal.

3. The "Tie-Breaker" Logic (Size-Biasing)

One of the clever tricks in the paper involves a concept called Size-Biasing.

The Analogy:
Imagine you are looking at a bag of marbles.

Normal View: You pick a marble at random.
Size-Biased View: Imagine the marbles are magnets. The bigger the marble, the more likely it is to be picked. If you pick a marble this way, you are more likely to pick the "winners" (the ones at the top of the ladder).

The author realized that to understand the "winners" (the people tied for first place), it's actually easier to look at the "winners" from the perspective of the Size-Biased view. It's like looking at the problem through a magnifying glass that highlights the top performers. This allowed him to connect the messy real-world data to the clean mathematical formulas.

4. The Results: "Good Enough" vs. "Perfect"

The paper gives us Upper Bounds. Think of this as a "Worst-Case Scenario" guarantee.

The Guarantee: "If you use our formula to guess the number of winners, your guess will be off by at most X amount."
The Reality Check: The author tested this with computer simulations (like running the talent show 10 million times in a computer).
- The Result: The "Worst-Case" guarantee (the Upper Bound) was actually a bit too scary. The real error was usually much smaller than the formula predicted.
- Example: The formula might say, "Your guess could be off by 10 people." But in reality, the guess was usually only off by 0.0001 people.
- Why? The math is conservative to be safe. But even with this "scary" safety margin, the paper proves the approximations are solid.

5. Why Does This Matter? (The "So What?")

You might ask, "Who cares about counting people tied for first place?"

Sports & Competitions: If you run a tournament, knowing how many people are likely to tie for the gold medal helps you plan tie-breaker rounds.
Reliability Engineering: Imagine a bridge with 1,000 bolts. If the bridge fails, it's because the weakest bolt failed. But if you want to know how many bolts are "close to failing" (near the maximum stress), this math helps engineers design safer structures.
Computer Algorithms: When computers sort huge lists of data, they often look for the "maximum" value. Knowing how many items are "tied" for the max helps optimize the code.

Summary in One Sentence

This paper builds a precise mathematical ruler to measure how well we can predict the number of "top performers" and "near-performers" in a random group, proving that simple formulas work surprisingly well, even if we have to invent some new math tools to prove it.

Here is a detailed technical summary of the paper "Approximations for the number of maxima and near-maxima in independent data" by Fraser Daly.

1. Problem Statement

The paper addresses the problem of approximating the distribution of the number of observations in a sample of $n$ independent and identically distributed (i.i.d.) random variables that are equal to the sample maximum (in the discrete case) or within a specific distance of an order statistic (in the continuous case).

Discrete Case: Let $X_1, \dots, X_n$ be i.i.d. discrete random variables taking positive integer values. Let $M_n = \max\{X_1, \dots, X_n\}$ . The quantity of interest is $K_n$ , the number of observations equal to the maximum:
$K_n = |\{i \in \{1, \dots, n\} : X_i = M_n\}|$
While $K_n$ does not generally converge to a limiting distribution as $n \to \infty$ , previous work (e.g., Brands et al.) suggested it is well-approximated by a Logarithmic distribution (for constant parameters) or a Poisson distribution (for parameters depending on $n$ ). However, these approximations lacked explicit error bounds.
Absolutely Continuous Case: Let $X$ be a continuous random variable. The paper considers the number of observations within a distance $a$ of the $\ell$ -th order statistic from the top ( $X_{n-\ell+1:n}$ ). This quantity, denoted $K_n(a, \ell)$ , is studied in the context of approximating it by a Negative Binomial distribution, extending previous limit theorems by Pakes and Li.

The primary goal is to derive explicit error bounds in total variation distance ( $d_{TV}$ ) for these approximations, providing quantitative measures of accuracy for applications in reliability, sports statistics, and algorithm analysis.

2. Methodology

The core methodology employed is Stein's Method, a powerful technique for deriving distributional approximations and error bounds.

Stein's Method for Logarithmic Distribution: A significant portion of the paper is dedicated to developing Stein's method specifically for a Logarithmic target distribution ( $L(\alpha)$ ).
- The author defines a size-biased version of a random variable ( $Y^\star$ ) and derives a specific Stein equation for the logarithmic distribution.
- The solution to the Stein equation is analyzed to bound the difference between the expectations of the target and the approximating distribution.
- This is the first known application of Stein's method for a logarithmic target distribution.
Stein's Method for Negative Binomial Distribution: The paper utilizes and extends existing Stein's method frameworks for the Negative Binomial distribution (based on Brown and Phillips).
- It establishes a connection between Mixed Binomial distributions and Negative Binomial approximations.
- The proof involves constructing specific couplings and utilizing size-biasing techniques to bound the total variation distance.
Coupling and Size-Biasing: The proofs rely heavily on constructing couplings between the random variable of interest (or its size-biased version) and the target distribution. The concept of size-biasing is central to transforming the problem into one where standard Stein bounds can be applied.

3. Key Contributions and Results

A. Discrete Case: Logarithmic Approximation

Theorem 1 provides two explicit upper bounds for $d_{TV}(K_n, L)$ , where $L$ is a Logarithmic distribution.

Part (a): Uses a parameter $\alpha$ derived from $P(K_n=1)$ and $E[K_n]$ . The bound involves a sum over the probability mass function and distribution function of $X$ .
Part (b): Uses a parameter $\beta$ derived from the ratio of the second moment to the first moment ( $E[K_n]/E[K_n^2]$ ). This bound is generally looser but uses tools similar to those needed for the continuous case.
Illustration: Applied to the Geometric distribution ( $X \sim \text{Geom}(p)$ ). The paper shows that for small $p$ , the Logarithmic approximation is excellent, with the derived upper bounds being tight (though simulations suggest the actual error is often an order of magnitude smaller than the theoretical bound).

B. Discrete Case: Poisson Approximation

Theorem 3 provides an explicit error bound for approximating $K_n$ by a Poisson distribution ( $Y \sim \text{Pois}(\lambda)$ ).

The parameter $\lambda$ is set to $E[(K_n)_2]/E[K_n]$ (ratio of the second factorial moment to the first moment).
The bound is derived by decomposing the distance into three parts: $d_{TV}(K_n, K_n^\star)$ , $d_{TV}(Y, Y+1)$ , and $d_{TV}(K_n^\star - 1, Y)$ , utilizing the mixed binomial representation of the size-biased variable.
Illustration: Applied to the Geometric distribution where $p = 1 - \mu/n$ . As $n \to \infty$ , $K_n$ approaches a defective Poisson limit. The theorem provides finite-sample error bounds for this regime.

C. Absolutely Continuous Case: Negative Binomial Approximation

Theorem 5 establishes an error bound for approximating $K_n(a, \ell) - 1$ by a Negative Binomial distribution ( $Z \sim \text{NB}(\ell, 1-\beta)$ ).

The parameter $\beta$ is chosen to match the mean of the target distribution.
The bound depends on moments ( $M_1, M_2$ ) of a specific function involving the density $f$ and distribution $F$ of $X$ .
Illustration 1 (Gumbel Distribution): For $X$ following a Gumbel distribution, the paper derives a geometric approximation (a special case of Negative Binomial) for observations near the maximum. The bound is shown to be $O(1)$ for fixed $a$ , though it converges to 0 if $a \to 0$ as $n \to \infty$ .
Illustration 2 (Uniform Distribution): For $X \sim U(0,1)$ , the paper derives conditions on $a(n)$ and $\ell(n)$ under which the approximation holds, showing the bound converges to zero if $an \to 0$ .

4. Significance and Implications

Quantitative Rigor: Prior to this work, approximations for the number of maxima were largely qualitative or asymptotic. This paper provides explicit, non-asymptotic error bounds in total variation distance, allowing practitioners to quantify the accuracy of these approximations for finite sample sizes.
Methodological Innovation: The development of Stein's method for the Logarithmic distribution is a novel contribution. This tool is of independent interest and can be applied to other problems involving logarithmic targets.
Unified Framework: The paper unifies the treatment of discrete maxima and continuous "near-maxima" under a framework of mixed binomial and negative binomial approximations.
Practical Applications: The results are directly applicable to:
- Reliability Engineering: Assessing the probability of multiple components failing simultaneously (reaching the maximum lifetime).
- Sports Analytics: Estimating the number of players tied for a record performance.
- Computer Science: Analyzing the output of randomized selection algorithms.

5. Limitations and Future Work

The author notes that while the bounds are explicit, they are not always tight (e.g., in the Gumbel example, the bound does not decay to zero for fixed $a$ , whereas the limit theorem suggests convergence). The paper suggests that more sophisticated coupling techniques could improve these rates. Additionally, extending these results to dependent data is identified as a promising direction for future research, leveraging the robustness of Stein's method to independence assumptions.

Approximations for the number of maxima and near-maxima in independent data

1. The Two Scenarios: Discrete vs. Continuous

World A: The "Ladder" Game (Discrete Data)

World B: The "Ramp" Game (Continuous Data)

2. The Magic Tool: Stein's Method (The "Shadow Puppet" Trick)

3. The "Tie-Breaker" Logic (Size-Biasing)

4. The Results: "Good Enough" vs. "Perfect"

5. Why Does This Matter? (The "So What?")

Summary in One Sentence

1. Problem Statement

2. Methodology

3. Key Contributions and Results

A. Discrete Case: Logarithmic Approximation

B. Discrete Case: Poisson Approximation

C. Absolutely Continuous Case: Negative Binomial Approximation

4. Significance and Implications

5. Limitations and Future Work

More like this

The fourth known primitive solution to a5+b5+c5+d5=e5a^5 + b^5 + c^5 + d^5 = e^5a5+b5+c5+d5=e5

Waring-Goldbach problems for one square and higher powers

Reductification of parahoric group schemes

Sobolev regularity of the symmetric gradient of solutions to a class of ϕ\phiϕ-Laplacian systems

On the approximation of Weierstrass function via superoscillations

The fourth known primitive solution to $a^5 + b^5 + c^5 + d^5 = e^5$

Sobolev regularity of the symmetric gradient of solutions to a class of $\phi$ -Laplacian systems