Proper losses regret at least 1/2-order

This paper establishes that strict properness is necessary and sufficient for non-vacuous surrogate regret bounds and resolves an open question by proving that the convergence rate of estimated probability vectors in pp-norm cannot exceed the square root of the surrogate regret, thereby confirming the optimality of strongly proper losses.

Han Bao, Asuka Takatsu

Published 2026-03-04
📖 5 min read🧠 Deep dive

Imagine you are a weather forecaster. Your job is to predict the probability of rain tomorrow. You don't just say "It will rain" or "It won't rain"; you give a percentage, like "There is a 70% chance of rain."

In the world of Machine Learning, we do the same thing, but instead of rain, we predict things like "Is this email spam?" or "What object is in this photo?" The model outputs a list of probabilities for every possible outcome.

This paper is about how to measure if your weather forecaster (or AI model) is actually getting better, and how fast they can improve.

Here is the breakdown using simple analogies:

1. The Problem: The "Scorecard" Dilemma

In machine learning, we need a way to grade our models. We use something called a Loss Function. Think of this as a scorecard.

  • If the model predicts 70% rain and it rains, the scorecard gives a low "penalty" (a good score).
  • If the model predicts 10% rain and it rains, the scorecard gives a high "penalty" (a bad score).

A Proper Loss is a special, fair scorecard. It has a golden rule: The only way to get the best possible score is to tell the truth. If the real chance of rain is 70%, the model must say 70% to minimize its penalty. If it lies and says 90%, it gets a worse score.

2. The Mystery: The "Gap" Between Truth and Prediction

Even with a fair scorecard, our AI model might not be perfect yet. It might predict 60% when the truth is 70%.

  • The Surrogate Regret: This is the "penalty difference." It measures how much worse the model did compared to the perfect truth. It's like a coach saying, "You lost 5 points because you weren't perfectly accurate."
  • The Real Question: The authors ask: "If we know the model lost 5 points on the scorecard, how far off is the actual prediction? Is it off by 1%? 10%? 50%?"

We want to know the relationship between the Scorecard Penalty (Surrogate Regret) and the Actual Distance (how far the prediction is from the truth).

3. The First Discovery: You Can't Cheat the System

The paper proves a fundamental rule: For the scorecard to be useful, it must be "Strictly Proper."

  • The Analogy: Imagine a game where you can win by lying. If the scorecard allows you to get a perfect score by guessing 50/50 even when the truth is 100%, the scorecard is broken.
  • The Result: The authors show that if the scorecard isn't "strictly proper" (meaning the truth is the only way to win), then the relationship between the penalty and the actual error breaks down. You could have a tiny penalty but a huge error, or vice versa. To have a reliable connection, the scorecard must force the model to tell the truth.

4. The Second Discovery: The "Square Root" Speed Limit

This is the big headline of the paper. The authors tackle a long-standing question: How fast can a model improve?

They look at the math of how the "Actual Distance" shrinks as the "Scorecard Penalty" gets smaller.

  • Imagine the penalty is a bucket of water, and the error is the water level. As you drain the bucket (reduce the penalty), how fast does the water level drop?
  • Some people hoped that if you improved the score by a little bit, the error would drop super fast (like a square relationship).
  • The Verdict: The authors prove that you cannot go faster than the square root.

The Metaphor:
Imagine you are walking toward a treasure (the perfect truth).

  • The "Surrogate Regret" is the noise you hear from the treasure.
  • The "Error" is how far you are from the treasure.
  • The paper proves that if you want to get half as far from the treasure, you can't just cut the noise in half. You have to cut the noise by four times (because the square root of 4 is 2).

This means that for a huge class of fair scorecards (including the famous "Cross-Entropy" used in almost all Deep Learning), the best you can hope for is that your error shrinks at a square root rate. You can't magically make the model converge to the truth twice as fast just by changing the math slightly.

5. Why This Matters

  • For AI Engineers: It tells them they shouldn't waste time looking for a "magic" loss function that makes models learn infinitely faster. The square root limit is a fundamental law of physics for these types of problems.
  • For the "Strongly Proper" Losers: There is a special class of scorecards called "Strongly Proper" (like the Brier score). The paper confirms that these are already doing the best job possible. They are hitting the theoretical speed limit.
  • For the "Strictly Proper" Losers: Even if a scorecard isn't "strong" (it's just "strictly" proper), it still can't beat the square root limit.

Summary in One Sentence

This paper proves that for any fair way of grading probability predictions, the relationship between the "grade" and the "actual accuracy" has a hard speed limit: you can't get more accurate faster than the square root of your grade improvement.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →