Same Error, Different Function: The Optimizer as an Implicit Prior in Financial Time Series

This paper demonstrates that in financial time series forecasting, where models often achieve identical out-of-sample error, the choice of optimizer acts as a critical implicit prior that significantly alters learned functions and decision outcomes, necessitating evaluation beyond scalar loss metrics.

Federico Vittorio Cortesi, Giuseppe Iannone, Giulia Crippa, Tomaso Poggio, Pierfrancesco Beneventano

Published 2026-03-04
📖 5 min read🧠 Deep dive

The Big Idea: "Same Score, Different Story"

Imagine you are a coach trying to pick the best player for your team. You have two players, Alex and Jordan. You run them through a series of drills, and they both get the exact same score: 95 out of 100.

In the world of finance and machine learning, this is what usually happens. Researchers build different AI models (like deep neural networks) to predict stock market volatility (how much prices will jump around). When they test these models, they often find that a complex AI model scores exactly the same as a simple, old-fashioned math model.

The paper's big discovery: Just because two models get the same score doesn't mean they are doing the same thing. They might be solving the puzzle in completely different ways, and that difference matters a lot when you actually try to use them to make money.


The Analogy: The Hiking Trip

Imagine you and a friend are trying to hike to the top of a mountain (the "best prediction"). You both start at the bottom and want to reach the summit with the least amount of effort (lowest "error").

  • The Landscape: The mountain is foggy and flat at the top. There isn't just one peak; there is a huge, flat plateau where many different paths lead to the same height.
  • The Hikers (The Models): You have different hiking styles (Architectures). One is a fast runner (Transformer), one is a steady walker (LSTM), and one is a simple hiker (Linear Model).
  • The Guide (The Optimizer): This is the most important part. The "Optimizer" is like the GPS or the guide telling you which direction to step next.
    • Guide A (SGD): Tells you to take small, steady, cautious steps. You might wander a bit, but you tend to stay on wide, safe paths.
    • Guide B (Adam): Tells you to sprint, slide, and take shortcuts. You move faster and might find a steeper, more direct route.

The Paper's Finding:
Even though both you and your friend end up at the exact same altitude (the same prediction error), your paths were totally different.

  • Guide A (SGD) led you to a path that is wide, flat, and stable. If the wind blows (market stress), you don't fall off.
  • Guide B (Adam) led you to a path that is narrow, steep, and full of sharp turns. It gets you there just as fast, but if the wind blows, you might slip and have to scramble back up.

In finance, this difference isn't just about the hike; it's about how often you have to stop and change your gear (trading).


Why Does This Matter? (The "Turnover" Problem)

The paper looks at what happens when you use these models to build a stock portfolio (a basket of investments).

  • The Stable Hiker (SGD): Because this model is "cautious," it doesn't change its mind often. It says, "This stock is risky," and sticks with that view for a while.
    • Result: You trade less. You pay fewer fees. Your portfolio is calm.
  • The Sprinting Hiker (Adam): Because this model is "aggressive," it reacts to tiny changes in the data. It says, "This stock is risky!" then five minutes later, "Wait, it's safe!" then "Risky again!"
    • Result: You are constantly buying and selling (high turnover). Even though your predictions are just as accurate as the stable hiker's, you are bleeding money on transaction fees and taxes because you are moving too much.

The Paper's Conclusion:
In the financial world, the "Guide" (Optimizer) is actually part of the model. You can't just pick the model with the best score. You have to ask: "Does this model's 'personality' (cautious vs. aggressive) fit my goals?"

If you want a stable portfolio, you might choose the "cautious" optimizer even if it has the exact same score as the "aggressive" one. If you pick the aggressive one, you might end up with a portfolio that turns over 3 times faster, eating up your profits.


Key Takeaways in Plain English

  1. Don't Trust the Scoreboard Alone: In finance, many different AI models get the same "grade." But getting an 'A' doesn't mean they are all doing the same job.
  2. The "Who" Matters as Much as the "What": It's not just about what the model predicts, but how it learns to predict it. The tool used to train the model (the Optimizer) leaves a hidden fingerprint on the final result.
  3. Simplicity is Often Better: The paper found that the simplest training method (SGD) often creates models that are more stable and less "jumpy." In the noisy world of the stock market, being less jumpy is often more valuable than being slightly more complex.
  4. The "Rashomon Effect": This is a reference to a famous movie where four people tell different stories about the same event, and all stories are technically true. In finance, many different models tell different "stories" (make different predictions) about the market, but they all end up with the same error score. The paper argues we need to look at which story makes the most sense for our money, not just which story has the best score.

The Bottom Line

When building AI for money, don't just look at the final grade. Look at the personality of the model. If two models get the same score, pick the one that behaves more like a steady, reliable friend rather than a nervous, jittery one, because that friend will save you money in the long run.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →