Learning with the Nash-Sutcliffe loss

This paper establishes a decision-theoretic foundation for the Nash-Sutcliffe efficiency by proving that its counterpart, the Nash-Sutcliffe loss, is strictly consistent for a specific multi-dimensional functional, thereby enabling the development of Nash-Sutcliffe linear regression and extending the metric's applicability to forecasting multiple stationary dependent time series with differing stochastic properties.

Hristos Tyralis, Georgia Papacharalampous

Published 2026-03-03
📖 5 min read🧠 Deep dive

Imagine you are a coach training a team of 100 different runners. Some run on flat tracks, some on hills, some in the rain, and some in the sun. Your goal is to pick the best training method to help them all run faster.

For decades, coaches have used a specific stopwatch metric called NSE (Nash-Sutcliffe Efficiency) to judge how good a runner is compared to just guessing their average speed. It's a popular metric because it's fair: it doesn't matter if a runner is naturally fast or slow; it only cares if they improved relative to their own baseline.

However, this paper reveals a hidden trap. The authors, Hristos Tyralis and Georgia Papacharalampous, discovered that while everyone has been using this stopwatch to judge the runners, they have been using a completely different set of rules to train them.

Here is the breakdown of their discovery in simple terms:

1. The "Wrong Map" Problem

Imagine you want to drive to a specific destination (the best prediction).

  • The Old Way: You use a GPS that tells you to minimize "total distance traveled" (this is the standard Mean Squared Error or MSE). This gets you to the geometric center of all possible paths.
  • The Judge's Rule: But the judge (the NSE metric) doesn't care about total distance. The judge cares about a "weighted score" that penalizes you differently depending on how bumpy the road was.

The paper argues that for a long time, scientists have been training their models using the "distance" GPS (MSE) but then judging them with the "bumpy road" score (NSE).

  • The Result: The models are driving toward the wrong destination. They are optimized for the wrong goal.

2. The "Weighted Average" Secret

The authors prove that the NSE metric isn't just looking for the "average" runner. It is actually looking for a Data-Weighted Average.

The Analogy:
Imagine you are trying to guess the average temperature of a city.

  • Standard Average (MSE): You take every single day's temperature, add them up, and divide by the number of days. A day that is 100°F counts the same as a day that is 70°F.
  • Nash-Sutcliffe Average: This method says, "Wait! If a day has very little variation (it's always 70°F), it's easy to predict, so let's trust it more. But if a day is chaotic (swinging between 50°F and 90°F), it's hard to predict, so let's weigh it differently."

The NSE metric essentially says: "I care more about the days that are stable and less about the days that are chaotic."

The paper proves that if you want to win the NSE game, you must train your model to target this specific "Weighted Average," not the simple average.

3. The New Solution: "Nash-Sutcliffe Regression"

The authors introduce a new training method called Nash-Sutcliffe Linear Regression.

  • Old Training (OLS): "Hey model, try to be as close to the middle as possible for everyone."
  • New Training (Nash-Sutcliffe): "Hey model, I'm going to give you a special pair of glasses. Through these glasses, some days look bigger and more important than others. Train yourself to hit that specific target."

They show mathematically that if you use this new training method, your model will perform significantly better when judged by the NSE metric. In fact, in their tests with real river flow and temperature data, the new method improved scores by huge margins (sometimes cutting the error in half) compared to the old way.

4. The "Apples and Oranges" Warning

The paper also gives a very important warning about how we compare different things.

The Analogy:
Imagine you are comparing the performance of a Formula 1 car and a bicycle.

  • If you use a standard ruler (MSE), you might say the car is "better" because it covers more distance.
  • If you use a "relative speed" metric (NSE), you might say the bicycle is "better" because it's doing amazing things relative to a human walking.

The authors say: You cannot mix these comparisons.
If you have 100 rivers, and 50 are small mountain streams (fast, chaotic) and 50 are huge slow rivers (stable), you cannot simply average their NSE scores to say "Our model is 80% good."

  • The NSE score only makes sense if all the rivers are behaving like the same type of river.
  • If you mix different types of rivers, the "Weighted Average" breaks, and the score becomes meaningless.

The Big Takeaway

This paper is a wake-up call for scientists, data analysts, and machine learning engineers.

  1. Stop Mismatching: If you plan to judge your model with the NSE metric, you must train it with the Nash-Sutcliffe loss function. Using the standard "average" training method is like trying to win a chess tournament by playing checkers.
  2. Respect the Data: Don't just throw all your time series into one big bucket. Make sure the things you are comparing actually belong to the same "family" of data.
  3. The New Tool: They have provided a new mathematical tool (Nash-Sutcliffe Regression) that is easy to use and guarantees that your model is aiming for the right target.

In short: To win the game, you have to play by the rules of the game, not the rules of a different sport.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →