A stochastic optimization algorithm for revenue maximization in a service system with balking customers

Imagine you run a very popular, single-lane coffee shop. You have one barista (the server), and customers arrive randomly. Your goal is simple: make the most money possible every hour.

To do this, you need to find the perfect price for your coffee.

If the price is too low, everyone rushes in. The line gets huge, people wait forever, and they get annoyed.
If the price is too high, people walk away immediately because it's too expensive.

The tricky part is that you don't know exactly how long the line will be or how long people will wait until you actually see them waiting. Plus, you can't ask the people who left (the ones who balked) why they left; you only see the people who actually bought a coffee.

This paper presents a smart, self-learning computer algorithm that acts as your "Price Manager." It figures out the perfect price on the fly, even though it can't see the people who walked away.

Here is how the paper breaks it down, using simple analogies:

1. The Problem: The "Invisible Crowd"

In many business models, you assume you know exactly how many people will show up at a certain price. But in a real queue (like a coffee shop or a server farm), it's a feedback loop:

High Price $\rightarrow$ Fewer people join.
Low Price $\rightarrow$ More people join $\rightarrow$ The line gets longer $\rightarrow$ People get annoyed and leave (this is called balking).

The authors' challenge is that you only see the "effective" arrivals (the people who actually joined). You don't see the "ghosts"—the people who saw the line, got scared, and left. Yet, you need to know about those ghosts to set the right price.

2. The Solution: A "Smart Guessing" Algorithm

The authors created a Stochastic Gradient Descent (SGD) algorithm. Think of this as a hiker trying to find the highest peak in a foggy mountain range (the peak is the maximum revenue).

The hiker can't see the whole mountain.
They can only feel the slope under their feet at their current spot.
They take a step in the direction that feels "uphill."
They repeat this, adjusting their steps, until they reach the top.

In this case, the "hiker" is the price. The "slope" is how much money you are making right now compared to the last price you tried. The algorithm nudges the price up or down to find the sweet spot.

3. The Secret Sauce: "Infinitesimal Perturbation Analysis" (IPA)

This is the most technical part, but here is the simple version:
Usually, to know how to change your price, you need to know how the entire system reacts. But since you can't see the people who left, you can't calculate the slope directly.

The authors invented a new way to estimate the slope using only the people who did join.

The Analogy: Imagine you are watching a river. You can't see the rain falling upstream (the people who left), but you can see the water level rising and falling.
The authors developed a mathematical trick (IPA) that looks at the tiny ripples in the water (the behavior of the people who joined) and uses them to mathematically reconstruct what the rain (the balking customers) must have been doing.
This allows the algorithm to "guess" the slope of the revenue curve accurately, even with missing data.

4. The "Window" Strategy

The algorithm doesn't change the price every second. It works in cycles or windows.

Set a price.
Wait for a while (a "window") to collect data on how many people joined and how long they waited.
Calculate the gradient (did we make more or less money?).
Adjust the price for the next window.

The paper spends a lot of time figuring out the perfect size for these windows.

Too small: You don't have enough data to know if the price change helped or hurt. The algorithm gets jittery and confused.
Too big: You waste time testing a bad price for too long before correcting it.
Just right: The algorithm learns fast and settles on the perfect price.

5. What Did They Prove?

The authors didn't just build a cool tool; they proved mathematically that it works.

Convergence: They showed that if you let the algorithm run long enough, it will always find the best possible price, no matter where it starts.
Regret: In the beginning, the algorithm might pick a bad price and lose some money. They calculated exactly how much money is "lost" while learning and proved that this loss grows very slowly compared to the total money you make.

Summary

This paper solves a complex puzzle: How do you set the perfect price for a service when customers might leave if the line is too long, and you can't see the people who left?

The answer is a self-correcting algorithm that uses a clever mathematical trick to infer the invisible customers' behavior from the visible ones. It learns by trial and error, adjusting its "step size" (the time it waits between price changes) to learn quickly without making too many mistakes.

In a nutshell: It's a robot barista that learns the perfect coffee price by watching who buys and who walks away, eventually figuring out the exact price that fills the shop without scaring anyone off.

Here is a detailed technical summary of the paper "A stochastic optimization algorithm for revenue maximization in a service system with balking customers" by Bodas, Honnappa, Mandjes, and Ravner.

1. Problem Formulation

The paper addresses the problem of dynamic revenue maximization in a single-server queueing system (M/G/1) where customers exhibit balking behavior.

System Dynamics:
- Potential customers arrive according to a Poisson process with rate $\Lambda$ .
- Upon arrival, a customer observes the current admission price $p$ and the current system workload (waiting time) $V$ .
- The customer decides to join or balk based on a joining probability function $H(p, V)$ . If they balk, they leave without generating revenue.
- The service provider aims to find an optimal price $p^*$ that maximizes the long-term expected revenue per unit time, $\Psi(p)$ .
The Challenge:
- Partial Observability: The provider only observes "effective arrivals" (customers who join). Customers who balk are unobserved. This creates a "censored" data problem where the arrival process is state-dependent (congestion affects the arrival rate).
- Intractability: The revenue function $\Psi(p)$ and its gradient are generally intractable to compute analytically because the stationary distribution of the workload depends on the price in a complex, non-linear way due to the balking mechanism.
- Objective: To develop an online learning algorithm that learns $p^*$ using only observed effective arrival data, without requiring prior knowledge of the demand curve or service-time distribution primitives (other than the customer behavior parameter $\theta$ , assumed known or estimable).

2. Methodology

The authors propose a Stochastic Gradient Descent (SGD) algorithm to iteratively update the price. The core of the methodology involves constructing a consistent estimator for the gradient of the revenue function using Infinitesimal Perturbation Analysis (IPA).

A. Objective Function Reformulation

The revenue function is expressed as:
$\Psi(p) = \frac{p}{\mathbb{E}[A_\infty(p)]}$
where $A_\infty(p)$ is the steady-state effective inter-arrival time. This formulation relies solely on the effective arrival process, avoiding the need to explicitly model the balking probability or queue length distribution in the objective function.

B. Gradient Estimation via IPA

To perform SGD, the algorithm requires an estimator for $\nabla_p \Psi(p)$ . Using the quotient rule and IPA, the gradient is derived as:
$\nabla_p \Psi(p) = \frac{1}{\mathbb{E}[A_\infty(p)]} - \frac{p \cdot \mathbb{E}[\nabla_p A_\infty(p)]}{(\mathbb{E}[A_\infty(p)])^2}$
The key innovation is the construction of the pathwise gradient estimator $\widehat{\nabla} A_\infty(p)$ :

Recursive Structure: The effective inter-arrival time $A_k$ is modeled as a function of a random seed $\zeta_k$ and the previous workload $W_{k-1}$ : $A_k = F_p^{-1}(\zeta_k; W_{k-1})$ .
Pathwise Derivative: The gradient $\nabla_p A_k$ is computed recursively along the sample path using the chain rule, differentiating the inverse CDF $F_p^{-1}$ with respect to $p$ and $W$ .
IPA Justification: The authors prove that under specific regularity conditions, the expectation of the pathwise derivative equals the derivative of the expectation ( $\mathbb{E}[\nabla_p A_\infty] = \nabla_p \mathbb{E}[A_\infty]$ ), validating the use of sample-path gradients.

C. The Learning Algorithm

The algorithm operates in cycles (iterations $k$ ):

Windowing: A time window $T_k$ is defined (deterministically increasing or random based on arrival counts).
Observation: The price $p_{k-1}$ is fixed for the duration of the window. The system collects data on effective inter-arrival times and their pathwise gradients.
Update: The price is updated using the SGD recursion:
$p_k = \pi_{\mathcal{P}} \left( p_{k-1} + \eta_k \widehat{\nabla} \Psi(p_{k-1}) \right)$
where $\eta_k$ is the step size and $\pi_{\mathcal{P}}$ is a projection operator.

3. Key Contributions

Novel Modeling Framework:
- Unlike previous works that add a "congestion penalty" term to revenue, this model integrates congestion effects directly into the demand function via balking.
- It explicitly handles unobserved balking, relying solely on effective arrival data, which is more realistic for many service systems.
New IPA Procedure for State-Dependent Arrivals:
- Standard IPA assumes independent inter-arrival times. This paper develops a novel recursive IPA formulation for effective inter-arrival times that depend on the system workload (state).
- They prove that the gradient estimator is consistent and has uniformly bounded variance.
Rigorous Convergence and Regret Analysis:
- Convergence: The authors prove that the price iterates $p_k$ converge almost surely to the optimal price $p^*$ .
- Bias-Variance Trade-off: They establish that the bias of the gradient estimator decays at a rate of $O(\eta_{k-1} (T^*_{k-1})^{-2})$ , while the variance remains bounded.
- Regret Bounds: They derive a regret bound of $O(\sum_{k=1}^L T^*_k k^{-\alpha/2})$ , quantifying the cumulative revenue loss relative to the optimal price.
Coupling Arguments:
- To handle the non-stationarity of the system during price updates, the authors employ coupling techniques to bound the difference between systems starting with different initial workloads. This allows them to control the transient behavior of the queue.

4. Results and Numerical Experiments

Convergence: Simulations demonstrate that the SGD algorithm reliably converges to the optimal price $p^*$ across various service-time distributions (Exponential, Gamma) and joining probability functions (Exponential, Power-law).
Impact of Service Time:
- Optimal price $p^*$ increases with the mean service time (larger jobs require higher prices to manage congestion).
- Optimal price $p^*$ decreases as the variance of service time increases (higher variability leads to more congestion risk, necessitating lower prices to attract customers).
Window Size Trade-off:
- The experiments highlight a trade-off between window size and convergence speed.
- Smaller windows allow for more frequent updates (faster iteration count) but yield noisier gradient estimates.
- Larger windows provide more accurate gradients but fewer updates per unit time.
- The results suggest that slower growth of window sizes (e.g., $T^*_k \propto \sqrt{k}$ or $\log k$ with small constants) often leads to faster convergence to the optimal price in terms of real time.

5. Significance and Future Directions

Theoretical Significance: The paper bridges the gap between queueing theory and stochastic optimization by providing a rigorous framework for optimizing systems with state-dependent arrival processes and partial observability. The derivation of the IPA estimator for balking customers is a significant theoretical advancement.
Practical Significance: The proposed algorithm is implementable in real-world settings (e.g., cloud computing, ride-sharing, call centers) where providers must adjust prices dynamically based on observed traffic without knowing the exact underlying demand curve or customer patience distributions.
Future Work:
- Extension to multi-server systems (which complicates the coupling arguments).
- Handling cases where customer parameters (e.g., disutility thresholds) are unknown and must be learned simultaneously with the price (requiring Reinforcement Learning approaches).
- Relaxing the strong concavity assumption on the revenue function.

In summary, this paper provides a robust, theoretically grounded, and practically applicable solution for dynamic pricing in congested service systems, overcoming the challenges of unobserved customer balking through a novel application of Infinitesimal Perturbation Analysis.