Sample Complexity Bounds for Robust Mean Estimation with Mean-Shift Contamination

Imagine you are trying to find the exact center of a massive, invisible cloud of marbles. This is the classic problem of Mean Estimation: given a bunch of data points, where is the "average" location?

In a perfect world, all the marbles are honest. But in the real world, a mischievous Adversary (let's call him "The Saboteur") is trying to trick you.

The Old Problem: The "Anything Goes" Saboteur

In the past, statisticians studied a scenario where The Saboteur could replace 10% of your honest marbles with completely fake marbles from a different universe. He could swap a marble from the center of the cloud with a giant boulder from Mars.

The Result: You could never find the true center. No matter how many marbles you looked at, the boulders would drag your average off course. The error would always stay stuck at a certain level. It was impossible to get a perfect answer.

The New Problem: The "Shifted" Saboteur

This paper studies a slightly different, more realistic version of the game. Here, The Saboteur is still allowed to swap 10% of your marbles, but he has a rule: he can only swap them with marbles that are shifted versions of your original cloud.

The Analogy: Imagine your cloud is a flock of birds flying in formation. The Saboteur can't replace them with rocks. He can only replace some birds with other birds that are flying in the same formation, but just shifted slightly to the left or right.
The Good News: Because the "fake" birds still look like the "real" birds (just moved), it is possible to figure out the true center, provided you have enough data and the right tools.

The Big Question

Previous research showed this works for very simple shapes, like a perfect bell curve (Gaussian) or a sharp spike (Laplace). But what about weird, complex shapes?

What if your cloud is a donut? A flat pancake? A jagged star?
Does the Saboteur have an advantage if the shape is weird?
How many samples (marbles) do you need to beat him?

The Solution: The "Fourier Flashlight"

The authors of this paper solved this mystery using a mathematical tool called Fourier Analysis. To explain this simply, let's use a Flashlight analogy.

Imagine the distribution of your data (the cloud of marbles) has a hidden "fingerprint" called a Characteristic Function.

The Saboteur's Trick: The Saboteur tries to hide the true center by shifting the fake data. In the "fingerprint" world, this looks like he is trying to cancel out the signal of the true center at specific frequencies.
The Flashlight (Frequency Witness): The authors realized that for almost any shape, there are specific "frequencies" (like specific colors of light in a rainbow) where the Saboteur cannot hide the truth.
- If you shine a "flashlight" at these specific frequencies, the fake data (the shifted marbles) will look very different from the real data.
- The paper defines a quantity called $\delta$ (delta). Think of $\delta$ as the brightness of the flashlight at the one spot where the Saboteur is weakest.
- The Rule: If the flashlight is bright enough ( $\delta$ is large), you can find the center easily. If the flashlight is dim or off ( $\delta$ is zero), the Saboteur wins, and you can't find the center.

The Main Findings

The paper provides two major results:

The Algorithm (How to Win): They built a recipe (an algorithm) that scans through these "frequencies." It looks for the spot where the Saboteur's shift is most obvious.
- The Cost: The number of samples you need depends on how bright that flashlight is. If the shape is "easy" (like a Gaussian), you need a few samples. If the shape is "hard" (like a very flat or jagged distribution), you need many more samples, but you can still do it.
- The Magic: They proved that as long as the shape isn't "band-limited" (meaning it doesn't have a hard cutoff where the signal completely disappears), you can always find a way to estimate the mean.
The Lower Bound (The Limit): They also proved that you can't do better than their recipe. If the flashlight is dim, you physically cannot distinguish the true center from the fake one, no matter how smart your computer is. This sets a hard limit on how fast any algorithm can solve the problem.

Why This Matters

Real-World Data: Real-world data is rarely a perfect bell curve. It's often messy, skewed, or has weird shapes. This paper tells us exactly how to handle that messiness when bad actors try to poison the data.
Security: This is crucial for AI safety. If hackers try to inject "shifted" data to fool an AI (like making a self-driving car think a stop sign is a speed limit sign), this research tells us exactly how much data we need to detect the trick and stay safe.
The "Sinc" Surprise: They showed that even for some very tricky distributions (like the "sinc" function, which looks like a wave), the Saboteur can actually win if the signal cuts off completely. This gives us a clear "stop sign" for when robust estimation is impossible.

In a Nutshell

The paper is like a guidebook for finding the center of a cloud of marbles when a trickster is trying to move some of them.

Old View: "If the trickster can move them anywhere, we lose."
New View: "If the trickster can only move them in a specific pattern, we can win, but the difficulty depends on the shape of the cloud."
The Tool: We use a "Fourier Flashlight" to find the one angle where the trickster's moves are obvious. The paper tells us exactly how bright that flashlight needs to be and how many marbles we need to count to win the game.

1. Problem Definition

The paper addresses the fundamental problem of robust mean estimation under a specific data contamination model known as Mean-Shift Contamination.

Context: In standard robust statistics (Huber's model), an adversary can replace a fraction $\alpha$ of clean samples with samples from an arbitrary distribution $Q$ . This model suffers from inherent information-theoretic limitations; consistent estimation (error $\to 0$ as sample size $\to \infty$ ) is impossible for many distributions because the adversary can shift the mean arbitrarily.
The Mean-Shift Model: This paper focuses on a more structured contamination model where the adversary is restricted. Instead of arbitrary outliers, the adversary replaces clean samples with samples drawn from the base distribution $D$ $D$ shifted by an arbitrary vector $z$ $z$ .
- Formal Definition: A sample $x$ $x$ is drawn from the contaminated distribution $D^{(\alpha)}_\mu$ $D_{μ}^{(α)}$ as follows:
  1. With probability $1-\alpha$ : $x = \mu + y$ , where $y \sim D$ (clean sample).
  2. With probability $\alpha$ : $x = z + y$ , where $z \sim Q$ (adversarial shift) and $y \sim D$ .
- Goal: Estimate the true mean $\mu$ of the base distribution $D$ given $n$ i.i.d. samples from $D^{(\alpha)}_\mu$ .
Open Question: While consistent estimation was known to be possible for specific cases (Gaussian and Laplace distributions), the sample complexity for general base distributions was an open problem. The paper aims to characterize the sample complexity for arbitrary distributions $D$ .

2. Methodology

The authors employ Fourier Analysis as the central tool to derive both upper and lower bounds. The core insight is that the characteristic function (Fourier transform) of the contaminated distribution factorizes in a way that reveals the shift.

Key Technical Concepts

Characteristic Function Identity:
Let $\phi_D$ be the characteristic function of the base distribution and $\phi_Q$ be that of the shift distribution. The characteristic function of the contaminated distribution is:
$\phi_{D^{(\alpha)}_\mu}(\omega) = \phi_D(\omega) \cdot \phi_Q(\omega)$
Since $Q$ places mass $(1-\alpha)$ on the true mean $\mu$ , $\phi_Q(\omega)$ is close to $(1-\alpha)e^{2\pi i \mu \cdot \omega}$ . The algorithm attempts to recover $\mu$ by analyzing the phase of $\phi_{D^{(\alpha)}_\mu}(\omega) / \phi_D(\omega)$ .
Frequency-Witness Condition (Upper Bound):
The authors define a condition under which the mean can be estimated efficiently. A distribution $D$ satisfies the $(\epsilon, A, \delta)$ -frequency-witness condition if, for any error vector $v$ with $\|v\| \ge \epsilon$ , there exists a frequency $\omega$ such that:
- Phase Separation: $v \cdot \omega$ is far from an integer (specifically $|\sin(\pi v \cdot \omega)| \ge A$ ). This ensures the phase shift caused by the error is detectable.
- Non-vanishing Mass: $|\phi_D(\omega)| \ge \delta$ . This ensures the denominator in the ratio estimator does not vanish, allowing for stable estimation.
Fourier Matching (Lower Bound):
To prove lower bounds, the authors construct two distinct distributions $P_0$ and $P_1$ (with means separated by $\epsilon$ ) that are statistically indistinguishable under the contamination model.
- They design the adversarial shift distributions such that their characteristic functions cancel out the signal of the mean shift in the "bad" frequency regions (where $|\phi_D|$ is small).
- They use a smooth periodic window function in the Fourier domain to ensure the constructed distributions are valid probability measures with appropriate tail decay, allowing the application of Plancherel's theorem to bound the Total Variation (TV) distance.

3. Key Contributions

A. Qualitative Characterization of Sample Complexity

The paper provides a complete characterization of the sample complexity for mean-shift contamination. The complexity is governed by the parameter $\delta$ , defined as:
$\delta = \inf_{\|v\| \ge \epsilon} \sup_{\omega: \text{dist}(\omega \cdot v, \mathbb{Z}) \ge \alpha} |\phi_D(\omega)|$

Upper Bound: There exists an algorithm that estimates the mean to error $\epsilon$ using $\tilde{O}(d / \delta^2)$ samples.
Lower Bound: Any algorithm requires at least $\Omega(1/\delta^{\Omega(1)})$ samples.
This establishes that the difficulty of the problem is entirely determined by the magnitude of the characteristic function of the base distribution at frequencies where the adversary cannot fully corrupt the signal.

B. The "Fourier Witness"

The paper introduces the notion of a Fourier Witness as a critical ingredient.

For any potential error direction, a witness frequency $\omega$ must exist where the base distribution has significant Fourier mass, and the error vector creates a detectable phase shift.
If no such witness exists (e.g., if the characteristic function is band-limited), consistent estimation is impossible.

C. Algorithm Design

The authors propose Algorithm 1, a sample-efficient estimator:

Discretization: Construct a fine grid (cover) of candidate means and a grid of frequencies.
Empirical Characteristic Function: Estimate $\phi_{D^{(\alpha)}_\mu}$ from samples.
Scoring: For each candidate mean $\hat{\mu}$ , compute a score based on the discrepancy between the empirical characteristic function and the theoretical expectation $(1-\alpha)e^{2\pi i \hat{\mu} \cdot \omega} \phi_D(\omega)$ .
Selection: Output the candidate that minimizes the worst-case discrepancy over the set of frequency witnesses.

4. Results and Applications

The paper applies these general bounds to specific distributions, recovering known results and providing new ones. Table 1 in the paper summarizes the sample complexities (where $d$ is dimension, $\alpha$ is contamination, $\epsilon$ is error):

Distribution $D$	Upper Bound (Samples)	Lower Bound (Samples)
Gaussian $N(0, I_d)$	$\tilde{O}(d \cdot e^{O((\alpha/\epsilon)^2)})$	$\Omega(e^{\Omega((\alpha/\epsilon)^2)})$
Laplace $Lap(0, I_d)$	$\tilde{O}(d \cdot \alpha^2/\epsilon^4)$	$\Omega((\alpha/\epsilon)^{1/2})$
Uniform $Unif([-1, 1]) $\|$ \tilde{O}(1/\epsilon) $\|$ \Omega((\alpha/\epsilon)^{1/6})$
Sum of $m$ Uniforms	$\tilde{O}(\alpha^{-2} (O(\alpha/\epsilon))^{2m})$	$\Omega((\alpha/\epsilon)^{(2m-1)/6})$

Gaussian Case: The bounds match previous work, showing exponential dependence on $(\alpha/\epsilon)^2$ .
Uniform Case: The paper shows that for uniform distributions, the sample complexity is polynomial in $1/\epsilon$ , which is significantly better than the Gaussian case.
Consistency: The paper proves that consistency is impossible for distributions with band-limited characteristic functions (where $\phi_D(\omega) = 0$ for large $\omega$ ), as the adversary can hide the mean shift in the zero-mass regions.

5. Significance

Resolution of an Open Problem: The paper resolves the open question posed in prior literature regarding the sample complexity of mean-shift contamination for general distributions. It moves beyond the Gaussian/Laplace regimes to a broad class of distributions.
Theoretical Foundation: It establishes a deep connection between the spectral properties (Fourier transform) of a distribution and its robustness to mean-shift contamination. This provides a new lens for analyzing robust statistics.
Algorithmic vs. Information-Theoretic: While the paper focuses on sample complexity (information-theoretic limits), the proposed algorithm is computationally efficient (polynomial time) for distributions where the characteristic function can be evaluated or approximated.
Comparison with Contemporaries: The authors distinguish their work from recent independent work [KKLZ26], noting that while [KKLZ26] focuses on computational efficiency via random projections, their bounds can be qualitatively suboptimal (exponential in dimension) for certain distributions. The current work provides the optimal sample complexity characterization.

In summary, this paper provides a comprehensive framework for understanding when and how well we can estimate means in the presence of structured outliers, identifying the Fourier magnitude of the base distribution as the fundamental bottleneck.