Online Inventory Problems: Beyond the i.i.d. Setting with Online Convex Optimization

This paper introduces MaxCOSD, an online convex optimization algorithm that provides provable guarantees for multi-product inventory control under general non-i.i.d. demand and stateful dynamics by establishing necessary non-degeneracy assumptions to enable learning.

Massil Hihat, Stéphane Gaïffas, Guillaume Garrigos, Simon Bussy

Published 2026-02-27
📖 5 min read🧠 Deep dive

Imagine you are the manager of a busy coffee shop. Every morning, you have to decide: How many pastries should I order today?

If you order too few, you miss out on sales because customers leave hungry (this is a "lost sale"). If you order too many, the unsold pastries go stale by tomorrow and you have to throw them away (this is a "waste" or "holding cost").

Your goal is to find the perfect balance to make the most money over time. This is the classic Inventory Problem.

The Old Way: Guessing with Rules

For decades, experts tried to solve this using math. But their math relied on some very unrealistic assumptions:

  1. The "Coin Flip" Assumption: They assumed customer demand is random but perfectly predictable in the long run (like flipping a fair coin). In the real world, demand is messy. It spikes on rainy days, drops on holidays, and changes based on trends. It's not a fair coin; it's a chaotic storm.
  2. The "Static" Assumption: They assumed products don't expire or that unsold items just sit there forever. In reality, milk spoils, and fashion goes out of style.

Because of these unrealistic rules, the old algorithms often failed when applied to real, messy businesses.

The New Approach: Learning on the Fly

This paper introduces a new way to think about the problem, called Online Inventory Optimization (OIO). Instead of trying to predict the future perfectly, the manager learns as they go, making decisions based on what happened yesterday.

The authors propose a new algorithm called MaxCOSD (Maximum Cyclic Online Subgradient Descent). Let's break down what that means using a simple analogy.

The "Tightrope Walker" Analogy

Imagine you are walking a tightrope (your inventory level).

  • The Goal: Stay in the middle of the rope (the perfect stock level).
  • The Danger: If you step too far left, you fall into "Lost Sales" (empty shelves). If you step too far right, you fall into "Waste" (expired goods).
  • The Wind (Demand): The wind is blowing you around unpredictably. Sometimes it's a gentle breeze; sometimes it's a hurricane.

The Problem with Standard Algorithms:
Most old algorithms are like a tightrope walker who takes a step, checks the wind, and immediately takes another step. If the wind suddenly stops or reverses, they might overshoot and fall off the rope because they didn't check if their new position was actually safe before committing to it.

The MaxCOSD Solution:
MaxCOSD is like a cautious tightrope walker who uses a safety check.

  1. The "Cycle": Instead of changing your position every single second, you take a few steps in one direction based on the wind you've felt so far.
  2. The "Feasibility Check": Before you actually commit to that new position, you ask: "If I stand here, will the wind blow me off the rope?"
    • If the answer is Yes (it's safe), you stay there and keep walking.
    • If the answer is No (it's unsafe because demand was too low or too high), you don't move. You stay put until you gather enough information to make a safe move again.

This "safety check" is crucial. It ensures you never make a move that breaks the rules of the game (like ordering negative pastries or running out of stock).

The Secret Ingredient: "Non-Degenerate" Demand

The paper makes a very important discovery: You cannot learn if the wind never blows.

If demand is zero (no one ever buys coffee), the manager has no information to learn from. The algorithm gets stuck.

  • The Assumption: The authors assume that demand is "non-degenerate." In plain English, this means: "At least sometimes, people actually buy something."
  • Why it matters: They prove that if demand can be zero too often, no algorithm can learn to be good. You need some signal (some sales) to adjust your strategy. This is a fundamental rule of the universe for these types of problems.

Why This Matters

The beauty of MaxCOSD is that it works even when:

  • Demand is not random (it can be seasonal, trending, or chaotic).
  • Products expire (perishable goods like food or medicine).
  • You have multiple products (a whole grocery store, not just one item).

The Result

The authors prove mathematically that MaxCOSD is the best possible strategy. It minimizes your "regret" (the money you lost by not making the perfect decision) at a rate that is proven to be the fastest possible.

In summary:
This paper gives business managers a new, smarter tool to manage their shelves. Instead of guessing based on old, rigid rules, they can use an algorithm that adapts to real-world chaos, checks its own safety before making changes, and learns effectively as long as customers keep showing up. It's like giving your inventory manager a pair of smart glasses that helps them walk the tightrope without falling, no matter how wild the wind gets.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →