Extreme Geometric Quantiles Under Minimal Assumptions, with a Connection to Tukey Depth

Imagine you are trying to describe the "shape" of a massive, invisible cloud of data points floating in space. In statistics, we often want to find the "center" of this cloud (like the average) or the "edges" (the outliers).

For a long time, statisticians have had two main ways to do this:

The "One-Dimensional" Way: Looking at the data from one angle at a time (like squinting at a cloud from the side).
The "Geometric" Way: Looking at the whole 3D (or multi-dimensional) shape at once.

This paper is about the Geometric Way, specifically focusing on the very outer edges of the cloud—the "extreme" points. The authors are asking: How fast do these extreme points move away from the center as we look further and further out? And, crucially, can we answer this without knowing the exact mathematical rules (moments) that generated the cloud?

Here is the breakdown of their findings using simple analogies.

1. The Two Main Tools: The Compass and the Flashlight

To understand the paper, you need to know the two tools they are comparing:

Geometric Quantiles (The Compass): Imagine you are standing in the center of a foggy field. You want to find the "edge" of the fog in a specific direction. You walk until you hit the point where a certain percentage of the fog is behind you. This is a Geometric Quantile. It's a way to map the shape of the data in every direction simultaneously.
Tukey Depth (The Flashlight): Imagine shining a flashlight from a point in the fog. The "depth" of that point is determined by how much light gets blocked by the fog in the worst direction. If you are in the center, the light is blocked from all sides (high depth). If you are on the edge, the light can escape easily in one direction (low depth). This is Tukey Depth.

The Big Discovery: The authors found a secret handshake between these two tools. They proved that if you know the "depth" of a point (how central it is), you can mathematically guarantee how far out the "Geometric Quantile" must be.

2. The Problem: Heavy Tails and Missing Rules

Usually, to predict how far out the edge of a cloud is, statisticians assume the cloud follows nice, predictable rules (like a Bell Curve). They assume the data doesn't have "heavy tails"—meaning there aren't too many extreme outliers.

But in the real world (finance, climate change, internet traffic), data often has heavy tails. It's like a cloud that has long, thin tendrils stretching out infinitely. In these cases, the usual math rules (like "average" or "variance") break down or don't exist.

The authors asked: Can we still predict how far out the edge is, even if we don't know the rules and the cloud has crazy, heavy tails?

3. The Solution: Bounds Without Rules

The authors developed two "safety fences" (bounds) for how far out these extreme points can be, without needing to know the specific rules of the cloud.

The Upper Bound (The "Don't Go Too Far" Fence)

They proved that no matter how wild the data is, the extreme points cannot grow faster than a certain speed.

Analogy: Imagine a balloon being inflated. Even if the rubber is weird and stretchy, there is a limit to how fast it can expand before it pops. The authors found a formula for that maximum expansion speed, even if the balloon is made of "heavy-tail" rubber.

The Lower Bound (The "Must Go This Far" Fence)

This is the most exciting part. They proved that the extreme points must be at least a certain distance away.

The Magic Connection: They showed that this minimum distance is directly linked to the Tukey Depth (the flashlight tool).
The Metaphor: Think of the Tukey Depth as a "safety zone" in the middle of the cloud. The authors proved that the "Geometric Quantile" (the edge marker) cannot hide inside this safety zone. It must be outside it.
Why it matters: This allows them to translate a complex 3D problem into a simple 1D problem. Instead of calculating the edge of a 3D cloud, they can just look at the "edge" of the cloud if you squint at it from one specific angle (a univariate quantile).

4. The "Curse of Dimensionality"

The paper also notes a funny quirk of high-dimensional space. As the number of dimensions increases (going from 3D to 100D), the "safety zone" in the middle gets smaller and smaller relative to the whole space.

Analogy: In a 3D room, the center feels cozy. In a 100-dimensional room, the "center" is so tiny that almost everything feels like it's on the edge. The authors' math shows how this geometric constant shrinks as dimensions grow, making the "lower bound" a bit more conservative (safer) in high dimensions.

5. What Happens When We Do Know the Rules?

The paper also looks at the "nice" case where the data has finite averages (moments).

The Refinement: When the data is well-behaved, the authors' general "safety fences" can be tightened. They showed that if you know the data has a "skew" (it leans to one side), you can predict the edge's position even more precisely. It's like knowing the wind is blowing from the left, so you know the cloud will stretch further to the right.

Summary: Why Should You Care?

This paper is a toolkit for statisticians dealing with messy, real-world data that doesn't follow textbook rules.

It's Robust: It works even when the data is crazy (heavy-tailed) and standard math fails.
It Connects the Dots: It bridges the gap between two different ways of measuring data (Geometric Quantiles and Tukey Depth), showing they are two sides of the same coin.
It's Practical: By linking complex multi-dimensional shapes to simple one-dimensional lines, it makes it easier to calculate and understand where the "extreme" outliers of a dataset actually are.

In short, the authors built a universal map that tells us how far the "edge of the world" is, whether the world is a gentle hill or a jagged mountain range, without needing to know the geology of the terrain first.

Here is a detailed technical summary of the paper "Extreme Geometric Quantiles Under Minimal Assumptions, with a Connection to Tukey Depth" by Singha, Kratz, and Vadlamani.

1. Problem Statement

The paper addresses the extremal behavior of geometric (spatial) quantiles in multivariate settings ( $\mathbb{R}^d, d > 1$ ). While geometric quantiles are well-established for characterizing the center of a distribution, their behavior in the tails (as the quantile level $\alpha \to 1$ ) is less understood, particularly under minimal assumptions.

Key challenges identified:

Existing literature often relies on strong moment conditions (e.g., finite second or third moments) or specific structural assumptions like Multivariate Regular Variation (MRV).
There is a lack of general bounds for the growth rate of the norm of extreme geometric quantiles ( $\|q_X(\alpha u)\|$ ) that apply to distributions with heavy tails (potentially lacking first or second moments).
The relationship between geometric quantiles and other fundamental multivariate concepts, specifically Tukey (halfspace) depth, requires further clarification in the extreme regime.

2. Methodology

The authors employ a hybrid approach combining probabilistic inequalities and geometric analysis:

Optimization Characterization: They utilize the defining property of geometric quantiles as the solution to a convex optimization problem involving the expected norm difference:
$q_X(\alpha u) = \arg \min_{q \in \mathbb{R}^d} \left( \mathbb{E}[\|X - q\| - \|X\|] - \langle \alpha u, q \rangle \right)$
This is equivalent to the condition $\alpha u = -\mathbb{E}\left[ \frac{X-q}{\|X-q\|} \right]$ .
Geometric Approach for Lower Bounds: Instead of relying on moment expansions, the authors derive lower bounds by establishing a set inclusion between the region of geometric quantiles and the Tukey depth central regions. This involves analyzing the angular probability mass within cones and using properties of the unit sphere $S^{d-1}$ .
Asymptotic Expansion: For distributions satisfying integrability conditions (finite moments), the authors extend the first-order asymptotic expansion results from previous literature (specifically [7]) to third-order expansions. This allows them to isolate the effects of skewness and tail asymmetry.
Minimal Assumptions: The core methodology is designed to function without assuming the existence of moments, making it applicable to heavy-tailed distributions (e.g., Cauchy, Pareto with $\beta \le 1$ ).

3. Key Contributions and Results

A. Moment-Free Bounds (General Case)

The paper establishes new upper and lower bounds for the norm of extreme geometric quantiles $\|q_X(\alpha u)\|$ as $\alpha \to 1$ , valid for distributions without any moment assumptions.

Upper Bound:
- Derived using triangle inequalities and survival probabilities.
- If $\mathbb{E}\|X\| < \infty$ , the bound is $\|q_X(\alpha u)\| \le \frac{2\mathbb{E}\|X\|}{1-\alpha}$ , yielding a rate of $O((1-\alpha)^{-1})$ .
- For heavy-tailed distributions where $P(\|X\| > c) \le A c^{-\beta}$ , the bound scales as $O((1-\alpha)^{-(1+1/\beta)})$ .
- Significance: This bound is sharp for distributions lacking moments higher than the first but is generally conservative compared to MRV-specific rates.
Lower Bound (Novel Contribution):
- The authors prove a set inclusion: The Tukey depth region at level $1 - \frac{\alpha^2}{M_\gamma} $is contained within the set of geometric quantiles with index$ \beta \le \alpha$.
- Theorem 3.3: $\{x : HD(x|X) \ge 1 - \frac{\alpha^2}{M_\gamma}\} \subseteq \{q_X(\beta u) : 0 \le \beta \le \alpha\}$ .
- Here, $M_\gamma$ is a geometric constant depending on the dimension $d$ and the minimal angular probability mass in a cone.
- Theorem 3.7: This inclusion leads to a lower bound expressed via univariate quantiles of projected data:
  $\|q_X(\alpha u) - \theta\| \ge \min_{\|u\|=1} \left| Q_{u^\top X}\left(1 - \frac{1-\alpha^2}{M_\gamma}\right) - u^\top \theta \right|$
  where $\theta$ is the halfspace median.
- Significance: This is the first result linking the growth of multivariate geometric quantiles directly to univariate quantiles and Tukey depth without moment assumptions.

B. Connection to Tukey Depth

The paper highlights a novel, rigorous connection between two distinct notions of multivariate quantiles:

Geometric Quantiles: Defined via an optimization of expected distances.
Tukey Depth: Defined via the minimum probability of a halfspace containing a point.
The authors show that extreme geometric quantiles cannot grow slower than the boundary of the corresponding Tukey depth region. This unifies the "center-outward" perspective of depth with the "norm-minimization" perspective of geometric quantiles.

C. Higher-Order Asymptotics (Integrable Case)

Under the assumption $\mathbb{E}\|X\|^3 < \infty$ , the authors refine the asymptotic expansion of the geometric quantile norm:
$\|q_X(\alpha u)\| \left[ \|q_X(\alpha u)\|^2(1-\alpha) - \frac{1}{2}(\text{tr}(\Sigma) - \langle \Sigma u, u \rangle) \right] \to C(u)$

The first-order term depends only on the covariance matrix $\Sigma$ (anisotropy).
The third-order term (the limit $C(u)$ ) captures skewness and tail asymmetry, specifically involving terms like $\mathbb{E}[\langle X, u \rangle \|X - \langle X, u \rangle u\|^2]$ .
This result distinguishes distributions that share the same covariance matrix but differ in higher-order tail behavior.

4. Significance and Implications

Robustness to Heavy Tails: The derived bounds are applicable to a broad class of distributions, including those with infinite variance or mean, which are common in finance, physics, and network data.
Theoretical Unification: By linking geometric quantiles to Tukey depth and univariate quantiles, the paper provides a more cohesive theoretical framework for multivariate extremes. It demonstrates that the "extremal direction" of a geometric quantile is constrained by the "deepest" univariate projections.
Dimensionality Insights: The analysis of the geometric constant $M_\gamma$ reveals a "curse of dimensionality," where the bound becomes more conservative as $d \to \infty$ due to the concentration of measure on the sphere.
Practical Utility: The lower bound provides a computable benchmark for outlier detection and risk assessment in high dimensions, relying on univariate quantile estimation which is often more robust and easier to compute than full multivariate optimization.

Conclusion

This work significantly advances the understanding of multivariate quantiles by removing reliance on moment conditions. It establishes that the extremal growth of geometric quantiles is fundamentally governed by the geometry of the underlying distribution's support and its relationship to Tukey depth. The results offer both general, robust bounds for heavy-tailed data and refined asymptotic expansions for lighter-tailed data, bridging the gap between geometric and depth-based statistical tools.

Extreme Geometric Quantiles Under Minimal Assumptions, with a Connection to Tukey Depth

1. The Two Main Tools: The Compass and the Flashlight

2. The Problem: Heavy Tails and Missing Rules

3. The Solution: Bounds Without Rules

The Upper Bound (The "Don't Go Too Far" Fence)

The Lower Bound (The "Must Go This Far" Fence)

4. The "Curse of Dimensionality"

5. What Happens When We Do Know the Rules?

Summary: Why Should You Care?

1. Problem Statement

2. Methodology

3. Key Contributions and Results

A. Moment-Free Bounds (General Case)

B. Connection to Tukey Depth

C. Higher-Order Asymptotics (Integrable Case)

4. Significance and Implications

Conclusion

More like this

The fourth known primitive solution to a5+b5+c5+d5=e5a^5 + b^5 + c^5 + d^5 = e^5a5+b5+c5+d5=e5

Waring-Goldbach problems for one square and higher powers

Reductification of parahoric group schemes

Sobolev regularity of the symmetric gradient of solutions to a class of ϕ\phiϕ-Laplacian systems

On the approximation of Weierstrass function via superoscillations

The fourth known primitive solution to $a^5 + b^5 + c^5 + d^5 = e^5$

Sobolev regularity of the symmetric gradient of solutions to a class of $\phi$ -Laplacian systems