Beyond Accuracy: Evaluating Forecasting Models by Multi-Echelon Inventory Cost

This study introduces a digitalized forecasting-inventory optimization pipeline that evaluates seven models on the M5 Walmart dataset, demonstrating that Temporal CNN and LSTM approaches significantly reduce inventory costs and improve fill rates compared to statistical baselines within single- and two-echelon newsvendor systems.

Swata Marik, Swayamjit Saha, Garga Chatterjee

Published 2026-03-18
📖 5 min read🧠 Deep dive

Imagine you are running a chain of lemonade stands. Your biggest headache? Guessing how many cups of lemonade you need to make tomorrow.

If you make too little, customers leave thirsty (you lose money and reputation). If you make too much, the lemonade goes sour and you have to throw it away (you lose money on waste). This is the classic "Newsvendor Problem."

For decades, business owners have used simple rules of thumb or basic math to make these guesses. But today, we have powerful computers and Artificial Intelligence (AI) that can look at patterns we can't see.

This paper asks a simple but crucial question: "Does using fancy AI to predict demand actually save us money, or is it just a cool trick that looks good on a report card?"

Here is the story of what they found, explained simply.

1. The Setup: The "Lemonade" Experiment

The researchers didn't just guess; they used a massive, real-world dataset from Walmart (the M5 dataset). They focused on one specific section: Food items in California.

They set up a digital simulation (a video game version of a supply chain) with two levels:

  • Level 1 (The Store): The lemonade stand itself.
  • Level 2 (The Warehouse): A big central kitchen that supplies all the stands.

They tested seven different "guessing machines" to see which one made the best predictions:

  • The Old Schoolers: Simple math rules like "Yesterday's sales = Today's sales" (Naive) or "Average of the last week" (Holt-Winters).
  • The Smart Statisticians: Complex math models like ARIMA.
  • The Machine Learning Pros: Advanced algorithms like XGBoost (which learns from trees of data).
  • The Deep Learning Giants: AI models that mimic the human brain, specifically LSTM (which remembers long-term patterns) and Temporal CNN (which spots patterns in time like a camera scanning a video).

2. The Game: How They Measured Success

Usually, scientists measure success by "Accuracy" (how close the guess was to the actual number). But the researchers said, "That's not enough!"

They cared about Money.

  • The Cost of Being Wrong:
    • Overage Cost: Making too much lemonade (waste).
    • Underage Cost: Not having enough lemonade (angry customers).
  • The Goal: Find the model that keeps the total bill (waste + lost sales) the lowest.

3. The Results: The AI Wins the Race

The results were clear. The "Deep Learning" models (the AI giants) were the champions.

  • The Winner: The Temporal CNN model. It was like a super-athlete that could see patterns in the data that the others missed. It reduced inventory costs by nearly 19% compared to the simple "guess yesterday's sales" method.
  • The Runner-Up: The LSTM model, which was also excellent.
  • The Losers: The old-school statistical models (ARIMA, Holt-Winters) struggled. They were like trying to navigate a stormy ocean with a paper map; they couldn't handle the sudden changes in customer behavior.

The Analogy:
Imagine the old models are like a weatherman who only looks out the window. If it's sunny today, he says it will be sunny tomorrow.
The AI models are like a weatherman with a satellite, radar, and a supercomputer. They can see a storm coming three days away, even if the sky is currently blue. Because they see the storm coming, they can prepare (stock up on umbrellas) and avoid getting soaked.

4. The Twist: The "Bullwhip Effect"

The researchers also tested what happens when you add a Warehouse (Level 2) into the mix.

  • The Problem: If the Warehouse makes a small mistake in guessing how much lemonade the stores need, that mistake gets magnified as it flows down to the stores.
  • The Finding: Even a tiny error at the top (the Warehouse) causes huge chaos at the bottom (the Stores). However, because the AI models were so accurate, they prevented this "whip" from cracking as hard. They kept the whole supply chain calm and efficient.

5. Why This Matters to You

This paper proves that better predictions = more money in the pocket.

  • For Businesses: Don't just buy expensive software because it sounds "smart." Buy it because it actually lowers your costs and keeps customers happy. The AI models didn't just predict better; they saved real dollars by reducing waste and stockouts.
  • For the Future: As supply chains get more complex (think global shipping, sudden pandemics, or viral trends), relying on old math won't work. We need AI that can "feel" the rhythm of the market to keep shelves stocked and prices low.

The Bottom Line

Think of this study as a taste test for business strategies.
They took a bunch of different "recipes" for predicting the future. The result? The fancy, high-tech AI recipes tasted the best and saved the most money. It turns out, in the chaotic world of retail, a little bit of artificial intelligence goes a long way toward keeping your business from going sour.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →