Online Inventory Problems: Beyond the i.i.d. Setting with Online Convex Optimization

Imagine you are the manager of a busy coffee shop. Every morning, you have to decide: How many pastries should I order today?

If you order too few, you miss out on sales because customers leave hungry (this is a "lost sale"). If you order too many, the unsold pastries go stale by tomorrow and you have to throw them away (this is a "waste" or "holding cost").

Your goal is to find the perfect balance to make the most money over time. This is the classic Inventory Problem.

The Old Way: Guessing with Rules

For decades, experts tried to solve this using math. But their math relied on some very unrealistic assumptions:

The "Coin Flip" Assumption: They assumed customer demand is random but perfectly predictable in the long run (like flipping a fair coin). In the real world, demand is messy. It spikes on rainy days, drops on holidays, and changes based on trends. It's not a fair coin; it's a chaotic storm.
The "Static" Assumption: They assumed products don't expire or that unsold items just sit there forever. In reality, milk spoils, and fashion goes out of style.

Because of these unrealistic rules, the old algorithms often failed when applied to real, messy businesses.

The New Approach: Learning on the Fly

This paper introduces a new way to think about the problem, called Online Inventory Optimization (OIO). Instead of trying to predict the future perfectly, the manager learns as they go, making decisions based on what happened yesterday.

The authors propose a new algorithm called MaxCOSD (Maximum Cyclic Online Subgradient Descent). Let's break down what that means using a simple analogy.

The "Tightrope Walker" Analogy

Imagine you are walking a tightrope (your inventory level).

The Goal: Stay in the middle of the rope (the perfect stock level).
The Danger: If you step too far left, you fall into "Lost Sales" (empty shelves). If you step too far right, you fall into "Waste" (expired goods).
The Wind (Demand): The wind is blowing you around unpredictably. Sometimes it's a gentle breeze; sometimes it's a hurricane.

The Problem with Standard Algorithms:
Most old algorithms are like a tightrope walker who takes a step, checks the wind, and immediately takes another step. If the wind suddenly stops or reverses, they might overshoot and fall off the rope because they didn't check if their new position was actually safe before committing to it.

The MaxCOSD Solution:
MaxCOSD is like a cautious tightrope walker who uses a safety check.

The "Cycle": Instead of changing your position every single second, you take a few steps in one direction based on the wind you've felt so far.
The "Feasibility Check": Before you actually commit to that new position, you ask: "If I stand here, will the wind blow me off the rope?"
- If the answer is Yes (it's safe), you stay there and keep walking.
- If the answer is No (it's unsafe because demand was too low or too high), you don't move. You stay put until you gather enough information to make a safe move again.

This "safety check" is crucial. It ensures you never make a move that breaks the rules of the game (like ordering negative pastries or running out of stock).

The Secret Ingredient: "Non-Degenerate" Demand

The paper makes a very important discovery: You cannot learn if the wind never blows.

If demand is zero (no one ever buys coffee), the manager has no information to learn from. The algorithm gets stuck.

The Assumption: The authors assume that demand is "non-degenerate." In plain English, this means: "At least sometimes, people actually buy something."
Why it matters: They prove that if demand can be zero too often, no algorithm can learn to be good. You need some signal (some sales) to adjust your strategy. This is a fundamental rule of the universe for these types of problems.

Why This Matters

The beauty of MaxCOSD is that it works even when:

Demand is not random (it can be seasonal, trending, or chaotic).
Products expire (perishable goods like food or medicine).
You have multiple products (a whole grocery store, not just one item).

The Result

The authors prove mathematically that MaxCOSD is the best possible strategy. It minimizes your "regret" (the money you lost by not making the perfect decision) at a rate that is proven to be the fastest possible.

In summary:
This paper gives business managers a new, smarter tool to manage their shelves. Instead of guessing based on old, rigid rules, they can use an algorithm that adapts to real-world chaos, checks its own safety before making changes, and learns effectively as long as customers keep showing up. It's like giving your inventory manager a pair of smart glasses that helps them walk the tightrope without falling, no matter how wild the wind gets.

1. Problem Definition

The paper addresses Online Inventory Optimization (OIO), a sequential decision-making problem where an inventory manager must determine order quantities (order-up-to levels) over time to minimize cumulative losses (regret) against an unknown demand process.

Key Challenges & Limitations of Existing Literature:

Standard Assumptions: Most existing algorithms rely on Independent and Identically Distributed (i.i.d.) demand assumptions, which fail to capture real-world correlations, non-stationarities, and trends.
Restricted Dynamics: Previous works often focus on specific dynamics (e.g., lost sales with non-perishable goods) and specific cost structures (Newsvendor loss).
Stateful Complexity: Unlike standard Online Convex Optimization (OCO), OIO involves stateful dynamics where the inventory state $x_t$ at time $t+1$ depends on the previous decision $y_t$ , the demand $d_t$ , and the specific dynamic (e.g., perishability, backlogging). This introduces a feasibility constraint ( $x_{t+1} \preceq [y_t - d_t]^+$ ) that must be satisfied at every step, making the problem harder than standard OCO.

The Proposed Framework:
The authors formalize OIO as a general framework extending OCO.

State: $x_t \in \mathbb{R}^n_+$ (inventory levels).
Decision: $y_t \in Y$ (order-up-to level), subject to $y_t \succeq x_t$ .
Dynamics: $x_{t+1} \preceq [y_t - d_t]^+$ (generalizing lost sales, backlogging, and perishability).
Loss: $\ell_t(y_t)$ , a convex loss function (e.g., Newsvendor cost).
Feedback: The manager observes $x_t$ and a subgradient $g_t \in \partial \ell_t(y_t)$ .
Goal: Minimize Regret $R_T = \sum_{t=1}^T \ell_t(y_t) - \inf_{y \in Y} \sum_{t=1}^T \ell_t(y)$ .

2. Methodology: The MaxCOSD Algorithm

The authors propose MaxCOSD (Maximum Cyclic Online Subgradient Descent), a novel algorithm designed to handle general, non-i.i.d. demands and stateful dynamics.

Core Mechanism:
Instead of updating the order-up-to level $y_t$ at every time step (which risks violating feasibility constraints), MaxCOSD operates in cycles.

Cyclic Updates: The algorithm maintains a candidate order-up-to level $\hat{y}_t$ but only updates the actual decision $y_t$ at specific "update periods" $t_k$ .
Feasibility Trigger: An update occurs only if the candidate level $\hat{y}_{t+1}$ satisfies the feasibility constraint relative to the next state $x_{t+1}$ . Specifically, the update happens if $x_{t+1} \preceq \hat{y}_{t+1}$ .
Adaptive Learning Rates: The algorithm uses an adaptive learning rate inspired by AdaGrad-Norm:
$\eta_t = \frac{\gamma D}{\sqrt{\|\sum_{s=t_k}^t g_s\|_2^2 + \sum_{m=1}^{k-1} \|\sum_{s \in T_m} g_s\|_2^2}}$
This allows the algorithm to adapt to the magnitude of gradients without requiring prior knowledge of the Lipschitz constant $G$ .

Theoretical Foundation:
The analysis relies on a new concept called Geometric Cycles. The authors prove that under specific demand assumptions, the length of the cycles between updates follows a geometric distribution, ensuring that updates happen frequently enough to control regret while maintaining feasibility.

3. Key Contributions

A. The MaxCOSD Algorithm

MaxCOSD is the first algorithm to provide provable $O(\sqrt{T})$ regret bounds for online inventory problems with:

Non-i.i.d. demands: It handles correlations and non-stationarity.
General Dynamics: It supports lost sales, backlogging, and perishability (including FIFO).
Stateful Constraints: It guarantees feasibility without requiring the demand to be i.i.d.

B. The Non-Degeneracy Assumption

The paper introduces Assumption 10 (Uniformly Probably Positive Demand):
$P[\forall i \in [n], d_{t,i} \ge \rho \mid \text{history}] \ge \mu$
This assumption states that there is a non-zero probability that demand is bounded away from zero.

Necessity: The authors prove via Proposition 13 and 14 that this assumption is sharp. Without it (i.e., if demand can be arbitrarily close to zero or zero with high probability), no algorithm can achieve sublinear regret in stateful inventory problems. This is a fundamental difference from standard OCO, where sublinear regret is possible even with zero gradients.

C. Theoretical Guarantees

Under convexity, boundedness, and the non-degeneracy assumption, MaxCOSD achieves:

Expected Regret: $E[R_T] = O(\sqrt{T})$ .
High Probability Regret: $R_T = O(\sqrt{T \log(T/\delta)})$ .
These bounds hold without assuming i.i.d. demands, a significant improvement over prior works like AIM [9] or DDM [25].

4. Results

Theoretical Results

Theorem 12: Establishes the $O(\sqrt{T})$ regret bound for MaxCOSD. The bound depends on the problem diameter $D$ , gradient bound $G$ , and the non-degeneracy parameters $\mu$ and $\rho$ .
Impossibility Results: Propositions 13 and 14 demonstrate that if demands can be zero (or converge to zero too fast), the regret becomes linear ( $O(T)$ ), proving the necessity of the non-degeneracy assumption.

Empirical Results

The authors tested MaxCOSD on five settings:

Single-product lost sales (i.i.d. Poisson): Outperformed or matched the AIM baseline.
Single-product perishable (i.i.d. Poisson): Outperformed the CUP baseline.
Multi-product (n=100) lost sales: Outperformed the DDM baseline.
Real-world data (M5 competition, n=3049): MaxCOSD was tested on non-i.i.d. real data where baselines (AIM, DDM) theoretically fail. MaxCOSD showed robust performance, though efficiency decreased slightly as $n$ increased due to longer cycles required to satisfy feasibility.
Comparison: MaxCOSD consistently demonstrated versatility across different dynamics and demand structures, validating its theoretical robustness.

5. Significance and Impact

Bridging OCO and Operations Research: The paper successfully bridges the gap between Online Convex Optimization theory and practical inventory management, moving beyond the restrictive i.i.d. assumptions that have dominated the field.
Real-World Applicability: By handling non-stationary and correlated demands (common in real supply chains) and perishable goods, MaxCOSD offers a more practical tool for industry applications than previous theoretical models.
Theoretical Rigor: The paper provides a rigorous proof that "learning" in stateful inventory systems is impossible without a minimum level of demand activity (non-degeneracy), a crucial insight for future algorithm design.
Generalization: The framework is flexible enough to accommodate various cost structures and inventory dynamics, making it a foundational step toward a unified theory of online inventory control.

In summary, this paper presents MaxCOSD, a robust, theoretically grounded algorithm that solves online inventory problems under realistic, non-i.i.d. conditions, proving that optimal regret rates are achievable provided demands are not degenerate.