Some facts about the optimality of the LSE in the Gaussian sequence model with convex constraint

This paper characterizes the necessary and sufficient conditions for the least squares estimator to be minimax optimal in a convex constrained Gaussian sequence model by linking optimality to the Lipschitz property of the local Gaussian width, while providing algorithms to compute worst-case risk and demonstrating these results across various geometric sets.

Akshay Prasadan, Matey Neykov

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are trying to find the location of a hidden treasure (the true data, μ\mu) in a vast, foggy field. You have a compass that gives you a rough direction, but it's covered in static noise (ξ\xi). Your goal is to guess the treasure's location as accurately as possible.

In the world of statistics, the Least Squares Estimator (LSE) is like a very intuitive, "common sense" strategy: Just walk straight toward the compass reading until you hit the edge of the allowed area.

In this paper, the authors ask a simple but profound question: "Is this 'walk straight' strategy always the best we can do, or are there situations where a smarter, more complex map would help us find the treasure faster?"

Here is a breakdown of their findings using everyday analogies.

1. The Setting: The Foggy Field and the Fence

Imagine the "allowed area" (the constraint set KK) is a fenced-in garden.

  • The Treasure (μ\mu): Somewhere inside the garden.
  • The Compass (YY): A noisy signal that points near the treasure but might be off due to fog (noise).
  • The LSE Strategy: You look at the compass, walk in that direction, and stop the moment you hit the fence. You assume the treasure is the closest point on the fence to where the compass is pointing.

This strategy is popular because it's easy to calculate and works great for simple shapes like circles or squares. But the authors wanted to know: Does it work for every shape?

2. The "Shape" Matters: The Gaussian Width

To figure out if the LSE is optimal, the authors looked at the shape of the garden. They used a concept called Gaussian Width.

Think of Gaussian Width as the "wiggle room" or the "complexity" of the garden's shape when viewed through a foggy lens.

  • Simple Shapes (Optimal): If your garden is a perfect circle or a flat square, the "wiggle room" is predictable. The LSE works perfectly here. It's like walking straight to a wall; you know exactly where you'll stop.
  • Complex Shapes (Suboptimal): If your garden is a weird, spiky pyramid or a twisted solid, the "wiggle room" changes drastically depending on where you are standing. In these cases, the LSE might walk you to a "dead end" on the fence, while a smarter strategy could have cut through the fog to find the treasure much faster.

3. The Golden Rule: Lipschitz Continuity

The paper's biggest discovery is a condition for when the LSE is the best possible guess. They call this the Lipschitz property.

The Analogy: Imagine the "Gaussian Width" is a terrain map showing how hard it is to find the treasure from any point.

  • The LSE is Optimal if this terrain map is smooth. If you take a small step, the difficulty of finding the treasure changes only a little bit. It's like walking on a gentle hill; you can trust your intuition.
  • The LSE is Suboptimal if the terrain is jagged. If a tiny step changes the difficulty from "easy" to "impossible," your simple "walk straight" strategy fails. You need a more complex algorithm to navigate the cliffs and valleys.

4. Examples: When to Trust Your Gut (and When Not To)

The authors tested their theory on various shapes to see if the LSE wins or loses.

🏆 The Winners (LSE is Optimal)

  • The 1\ell_1 Ball (The Diamond): Imagine a diamond shape. The LSE works great here.
  • The 2\ell_2 Ball (The Circle): A perfect circle. The LSE is perfect.
  • Isotonic Regression (The Staircase): Imagine data that must always go up (like a staircase). If the total height is known, the LSE is the best guess.
  • Linear Subspaces (The Flat Plane): If the treasure is guaranteed to be on a flat sheet of paper, walking straight to the paper is the best move.

🏳️ The Losers (LSE is Suboptimal)

  • The Pyramid: Imagine a pyramid with a very sharp tip. The LSE might get stuck near the base, thinking the treasure is there, while a smarter estimator realizes the treasure is likely near the sharp peak. The "jaggedness" of the pyramid tricks the LSE.
  • The Solid of Revolution: Think of a vase shape. If the noise is high, the LSE might project the signal onto the wide base of the vase, missing the narrow neck where the treasure actually is.
  • p\ell_p Balls (The Squashed Sphere): For shapes between a circle and a square (where $1 < p < 2$), the LSE is actually too slow. It's like trying to navigate a city with a map that only shows straight lines, while the actual streets are diagonal.

5. The Takeaway

The paper provides a mathematical "checklist" (algorithms and formulas) to determine if your specific problem has a "smooth" shape (where the LSE is king) or a "jagged" shape (where you need a more sophisticated, computationally expensive method).

In simple terms:

  • If your data constraints form a smooth, predictable shape, the simple "Least Squares" method is the best tool you have.
  • If your constraints form a spiky, complex, or jagged shape, the simple method will fail to find the best answer, and you need a more advanced algorithm to avoid getting lost in the fog.

The authors didn't just say "it depends"; they gave us the tools to measure the "jaggedness" of the problem and decide exactly when to switch strategies.