Bilateral Trade Under Heavy-Tailed Valuations: Minimax Regret with Infinite Variance

Imagine you are a matchmaker running a bustling marketplace. Every day, a buyer and a seller walk in, each with a secret number in their head representing how much they value an item. Your job is to suggest a price.

If your price is too low, the seller walks away, and you miss a sale.
If your price is too high, the buyer walks away, and you miss a sale.
If your price is just right (between their two secret numbers), the deal happens, and you earn a small "gain" (the difference between their values).

Your goal is to learn the "true value" of the item based on clues (like the weather, the time of day, or the buyer's mood) so you can set the perfect price every time. The "Regret" is simply the total amount of money you lost because you didn't guess the price perfectly.

The Problem: The "Wild" Market

In most previous studies, economists assumed that these secret values were "well-behaved." They assumed that while the numbers might wiggle around, they wouldn't go crazy. Mathematically, this meant the "variance" (how wildly they jump) was finite.

But in the real world—think stock markets, insurance claims, or rare real estate deals—values can be wild. Sometimes, a single event causes a price to jump 100 times higher than usual. These are called Heavy-Tailed distributions. In these markets, the "average" jump is huge, and the "variance" is actually infinite.

Previous algorithms for matchmakers relied on calculating a simple average to learn the true value. But if the data has infinite variance, a simple average is useless; it gets thrown off by one crazy outlier and never settles down.

The Paper's Solution: A New Way to Learn

This paper, by Hangyi Zhao, solves the problem of how to learn in these "wild" markets. Here is the breakdown of their three big ideas, explained simply:

1. The "Self-Bounding" Safety Net

The authors first proved a surprising fact: Even if the market is crazy, your mistake doesn't grow as fast as you think.

Imagine you are trying to hit a bullseye. In a normal market, if you miss by 1 inch, you lose a little money. If you miss by 2 inches, you lose 4 times as much. This is called a squared relationship.
The authors proved that even in this wild, heavy-tailed market, if you guess the price wrong, your "Regret" (lost money) still grows only with the square of your error.

Why this matters: It means you don't need to know the exact value to do well. You just need to get close. This turns a hard "guessing the exact number" problem into an easier "estimating the average" problem.

2. The "Truncated Mean" Filter

Since you can't use a normal average (because one crazy outlier ruins it), the authors suggest a clever trick: The Truncated Mean.

Imagine you are listening to a choir. One singer is screaming at the top of their lungs (an outlier).

Old Method: You try to calculate the average pitch of the whole choir. The screaming singer makes the average sound terrible.
New Method: You decide, "I will ignore anyone singing louder than a certain volume." You cut off the extreme screams. Then, you average the remaining voices.

In the paper, they use a mathematical version of this "cutting off" (truncation). They ignore the most extreme price jumps when calculating the average. This allows them to find the true center of the market even when the data is chaotic.

3. The "Epoch" Strategy

Instead of trying to learn and guess every single second, the matchmaker works in Epochs (rounds of learning).

Round 1: Guess randomly.
Round 2: Look at the data from Round 1, use the "Truncated Mean" to filter out the noise, and make a better guess for Round 2.
Round 3: Use data from Round 2 to refine the guess for Round 3.

By doubling the size of these learning rounds, they prove that the matchmaker gets smarter very quickly, even with infinite variance.

The Results: How Fast Can You Learn?

The paper calculates the exact speed at which you can learn in this wild market.

If the market is "normal" (finite variance): You learn at a standard, fast pace.
If the market is "wild" (infinite variance): You learn slower, but not impossibly slow. The paper gives a precise formula showing that as the market gets "wilder" (heavier tails), your learning speed slows down, but it never stops.

They also proved that you cannot do better than this speed. No matter how clever your algorithm is, if the market is this wild, you cannot learn faster than their method allows.

The Big Picture Analogy

Think of navigating a ship in a storm.

Old Theory: The waves are small and predictable. You can use a standard compass (Average) to steer.
This Paper: The waves are tsunamis. One giant wave can knock your compass off.
The Solution: The authors built a storm-proof compass. It ignores the giant, freak waves (Truncated Mean) and focuses on the general direction of the smaller waves. They proved that even in a hurricane, you can still steer your ship to the destination, and they calculated exactly how long the journey will take.

In short: This paper teaches us how to make smart decisions in chaotic, unpredictable markets by ignoring the extreme outliers and focusing on the reliable patterns, proving that even in the wildest financial storms, we can still find a path to efficiency.

Here is a detailed technical summary of the paper "Bilateral Trade Under Heavy-Tailed Valuations: Minimax Regret with Infinite Variance" by Hangyi Zhao.

1. Problem Statement

The paper addresses the problem of Contextual Online Bilateral Trade under full feedback when trader valuations exhibit heavy-tailed distributions with infinite variance.

Setting: A broker interacts with a buyer and a seller over $T$ rounds. In each round $t$ , a public context vector $x_t \in [0, 1]^d$ is revealed. The buyer's valuation $V_t$ and seller's valuation $W_t$ are generated as:
$V_t = m(x_t) + \xi_t, \quad W_t = m(x_t) + \zeta_t$
where $m(\cdot)$ is an unknown market value function, and $\xi_t, \zeta_t$ are independent noise terms.
Mechanism: The broker sets a price $P_t$ . Trade occurs if $P_t$ lies between $V_t$ and $W_t$ . The gain from trade is $g(P_t, V_t, W_t)$ .
Objective: Minimize Regret ( $R_T$ ), defined as the cumulative difference between the optimal expected gain (using the true $m(x_t)$ ) and the actual gain achieved by the broker's prices.
Key Challenge: Previous literature (e.g., Bachoc et al., ICML 2025) assumed finite variance ( $E[\xi^2] < \infty$ ), allowing the use of Ordinary Least Squares (OLS). This paper considers the regime where noise has bounded density but infinite variance, specifically satisfying a finite $p$ -th moment for $p \in (1, 2)$ (e.g., Student's $t$ -distribution with $\nu < 2$ ). Standard OLS fails here, and the goal is to determine the minimax optimal regret rate.

2. Methodology

The authors bridge a structural gap between the self-bounding property of bilateral trade and robust statistical estimation under heavy tails.

A. Generalized Self-Bounding Property (Lemma 3.1)

A critical structural insight is the extension of the self-bounding property to real-valued valuations with infinite variance.

Result: Under the assumption of bounded noise densities (and only $E[|\xi|] < \infty$ ), the expected regret of pricing at $\pi$ instead of the true mean $m$ is bounded by the squared estimation error:
$E[g(m, V, W) - g(\pi, V, W)] \leq L |m - \pi|^2$
Significance: This reduces the problem of controlling regret to the problem of robust mean estimation. Even with infinite variance, the regret scales quadratically with the error in estimating the mean, provided the density is bounded.

B. Algorithmic Approach: Epoch-Based with Truncated Means

To handle the infinite variance, the authors design an epoch-based algorithm utilizing truncated-mean estimators (following Bubeck et al., 2013).

Epoch Partitioning: The $T$ rounds are divided into geometric epochs ( $k=1, \dots, K$ ).
Estimation: In epoch $k$ , the algorithm uses data from epoch $k-1$ to estimate the parameter (either the linear coefficient $\phi$ in the parametric case or the function value $m(c_j)$ in cell $j$ for the nonparametric case).
Truncation: Instead of standard averaging, the algorithm computes the coordinate-wise truncated mean of score vectors (or observations). Observations exceeding a threshold $\tau$ are discarded. This ensures concentration bounds hold even when the variance is infinite, provided the $p$ -th moment is finite.
Prediction: The broker sets the price $P_t$ based on the robust estimate derived from the previous epoch.

C. Lower Bound Construction

To prove optimality, the authors establish matching lower bounds using Assouad's method combined with a smoothed moment-matching construction.

They construct a set of hard instances using "bump" functions.
Crucially, they smooth discrete distributions (used in standard heavy-tailed lower bounds) to satisfy the bounded density assumption without altering the $p$ -th moment constraints or the Kullback-Leibler (KL) divergence significantly.

3. Key Contributions

Extension of Self-Bounding Property: Proved that the quadratic regret bound holds for real-valued valuations under bounded density alone, removing the requirement for finite variance.
Tight Regret Rates for Heavy Tails: Designed algorithms achieving tight regret rates for both parametric and nonparametric settings under $p$ -th moment constraints ( $p \in (1, 2)$ ).
Minimax Optimality: Established matching lower bounds, proving that the derived rates are optimal up to logarithmic factors.
Interpolation of Rates: Characterized how the regret rate interpolates between the classical nonparametric rate (when $p=2$ ) and the trivial linear rate (as $p \to 1^+$ ).

4. Main Results

The paper provides the exact minimax regret rates for the problem. Let $d$ be the dimension, $\beta$ be the Hölder smoothness of $m$ , and $p \in (1, 2)$ be the moment parameter.

A. Parametric Case ( $m(x) = x^\top \phi$ )

Regret Rate: $\tilde{O}\left( T^{(2-p)/p} \right)$
Analysis:
- When $p=2$ (finite variance), this recovers the classical $O(\log T)$ rate.
- As $p \to 1^+$ , the rate approaches $\tilde{O}(T)$ , reflecting the difficulty of learning with extremely heavy tails.

B. Nonparametric Case ( $m$ is $\beta$ -Hölder)

Regret Rate: $\tilde{O}\left( T^{1 - \frac{2\beta(p-1)}{\beta p + d(p-1)}} \right)$
Analysis:
- When $p=2$ , the exponent becomes $\frac{d}{2\beta + d}$ , recovering the classical Stone rate for nonparametric regression.
- As $p \to 1^+$ , the exponent approaches 1, leading to linear regret.
- The rate explicitly depends on the interplay between the smoothness $\beta$ , dimension $d$ , and the tail heaviness $p$ .

5. Significance and Implications

Robustness in Financial Markets: The results are highly relevant for applications like financial markets, insurance, and real estate, where valuation data often follows heavy-tailed distributions (e.g., Student's $t$ ) with infinite variance. The paper proves that efficient learning is still possible, albeit at a slower rate than in the finite-variance regime.
Theoretical Gap Closure: It closes the gap between the structural properties of bilateral trade (self-bounding) and the statistical limitations of heavy-tailed estimation. It demonstrates that the "squared error" nature of bilateral trade regret does not require finite variance, only bounded density and a first moment.
Optimality: By providing matching lower bounds, the paper settles the question of what is theoretically achievable in this setting, showing that the proposed epoch-based truncated-mean approach is essentially optimal.

6. Open Questions

The authors identify several directions for future work:

Can the $\log T$ overhead from the epoch-based approach be eliminated by a fully online robust estimator?
Do specific tail shapes (e.g., sub-Gaussian) improve the constant factors in the regret bound?
Can the results be extended to heteroskedastic settings where the noise variance depends on the context $x_t$ ?