Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback

Imagine you are running a small lemonade stand, but instead of selling to neighbors, you are competing in a massive, high-speed digital auction for digital "billboards" (ad impressions) on the internet.

Here is the story of the paper, broken down into simple concepts, analogies, and a step-by-step explanation of how the authors solved a very tricky problem.

The Setting: The "Blind" Auction House

In the old days, online ads worked like a Second-Price Auction. If you wanted to buy a billboard, you would shout out your true value (e.g., "$5"). If you won, you only paid the second-highest bid (maybe $4.50). This was easy: just bid your true value, and you were safe.

But the world changed. Now, most ads are sold via First-Price Auctions. If you bid $5 and win, you pay the full $5.

The Problem: If you bid your true value, you make zero profit. You have to "shade" your bid (bid less than you think it's worth) to make money. But how much less? If you bid too low, you lose the ad. If you bid too high, you win but lose money.

The Twist: You also have a Budget. You can't just keep bidding forever; you have a fixed amount of money (say, $100) for the whole day. You need to stretch that $100 to get the most "lemonade sales" (rewards) possible.

The Hardest Part: One-Sided Feedback
This is where the paper gets really clever. In this digital world, the auction house is secretive.

If you win: You see the ad you bought and how much you paid.
If you lose: You see nothing. You don't know what the winning bid was. You just know, "Oh, I lost."

It's like playing a game of Guess the Price where the host only tells you "Too Low" or "Too High" if you lose, but if you win, they just say "You got it!" and move on. You never know the exact price of the item if you lose.

The New Challenge: Context Matters

Previous research assumed that the competition was random, like rolling a die every time. But in reality, the competition changes based on Context.

Context: Imagine the "context" is the type of person looking at the billboard (e.g., a rich investor vs. a college student).
The Reality: When a rich investor is looking, the other bidders will bid higher. When a student is looking, they bid lower.
The Goal: You need to learn the rule: "When I see a rich investor, I should bid higher. When I see a student, I should bid lower." But you have to learn this rule while only getting "one-sided" feedback (you don't know the exact bids of others when you lose).

The Solution: The "Quantile Detective"

The authors (Zeng Fu, Jiashuo Jiang, and Yuan Zhou) created a new algorithm to solve this. Here is how they did it, using a simple analogy:

1. The "Censored" Data Problem

Usually, to learn a pattern, you need to see all the data points. But here, your data is "censored." You only see the winning bids when you lose (because you bid too low). When you win, you don't know how close you were to the other bidders. It's like trying to learn the average height of people in a room, but you can only see the heights of people who are shorter than you.

2. The "Quantile Invariance" Trick

The authors invented a new way to estimate the competition's behavior called Robust Regression based on Conditional Quantile Invariance.

The Analogy: Imagine you are trying to guess the average height of a crowd, but you can only see people who are shorter than your knee.
The Trick: Instead of trying to guess the average (which is impossible with missing data), you look at specific "milestones" (quantiles).
- You split the crowd into two groups: "Short Context" and "Tall Context."
- You ask: "In the 'Short Context' group, where is the 90th percentile of the hidden bids?"
- Then you ask: "In the 'Tall Context' group, where is the 90th percentile?"
- Even though you don't see the exact numbers, the difference between these two milestones stays consistent and reveals the hidden rule (the slope of the line) connecting the context to the bid.

By focusing on these "milestones" rather than the exact numbers, they can learn the competition's strategy without ever seeing the losing bids.

3. The "Dual Update" (The Budget Manager)

Once they have a good guess of the competition, they need to manage the budget.

They use a mathematical tool called a Dual Variable (think of it as a "Budget Anxiety Meter").
If you are spending money too fast, the "Anxiety Meter" goes up. The algorithm automatically tells you to bid lower to save money.
If you have plenty of money left, the meter goes down, and you can bid more aggressively to win more ads.

The Results: Why It Matters

The authors proved mathematically that their algorithm is optimal.

Regret: In learning theory, "regret" is the difference between how much money you made and how much money you could have made if you knew everything from the start.
The Achievement: Their algorithm achieves a regret of roughly $\sqrt{T}$ (where $T$ is time). This is the best possible speed for this type of problem. It means that as time goes on, your strategy gets smarter and smarter, and you lose less and less money compared to a perfect expert.

Summary in a Nutshell

The Problem: You are bidding in a first-price auction with a limited budget. You only get feedback when you lose, and the competition changes based on the situation (context).
The Difficulty: You can't learn the competition's habits because you don't see their bids when you lose.
The Innovation: The authors created a "Quantile Detective" method. Instead of trying to see the whole picture, they look at specific statistical milestones to figure out the hidden pattern of the competition.
The Outcome: They built an algorithm that learns quickly, manages your budget perfectly, and makes almost as much money as if you knew the future.

Real-World Impact: This helps companies like Google, Facebook, and advertisers spend their ad budgets much more efficiently, ensuring they get the best value for their money even when the market is unpredictable and information is scarce.

Here is a detailed technical summary of the paper "Online Bidding for Contextual First-Price Auctions with Budgets under One-Sided Information Feedback."

1. Problem Formulation

The paper addresses the challenge of learning to bid in repeated First-Price Auctions (FPA) under three simultaneous constraints:

Budget Constraints: The bidder has a total budget $B$ over a time horizon $T$ .
Contextual Information: The bidder observes a context vector $x_t$ (e.g., user demographics) before bidding. The private value $v_t$ is a function of $x_t$ ( $v_t = f(x_t)$ ). Crucially, the competitors' highest bid $d_t$ is also context-dependent, modeled as $d_t = \alpha x_t + z_t$ , where $\alpha$ is an unknown parameter and $z_t$ is unknown noise.
One-Sided Information Feedback: The bidder only observes the competitor's bid $d_t$ if they lose the auction ( $b_t < d_t$ ). If they win ( $b_t > d_t$ ), they only know they won and pay $b_t$ , but $d_t$ remains hidden. This creates a "censored" data problem.

Objective: Maximize total expected reward (surplus) $\sum (v_t - b_t) \cdot \mathbb{I}(b_t > d_t)$ subject to the budget constraint $\sum b_t \cdot \mathbb{I}(b_t > d_t) \leq B$ . The performance is measured by Regret against the optimal feasible strategy.

2. Methodology

The authors propose a novel algorithm (Algorithm 2) that integrates Robust Regression, Dual Optimization, and Phase-Based Learning.

A. Robust Parameter Estimation (The Core Innovation)

The primary difficulty is estimating the unknown parameter $\alpha$ (the sensitivity of competitors' bids to context) given that $d_t$ is only observed when $b_t < d_t$ . Standard regression fails because the censoring is non-random (dependent on the bidder's policy).

Solution: The authors introduce a Quantile-Based Estimator (Algorithm 1) based on Conditional Quantile Invariance.
Mechanism:
1. Split samples into two groups based on the median of the context $x_t$ .
2. Define residuals $R_i(\alpha) = d_i - \alpha x_i$ for observed (losing) bids, and treat censored bids as $-\infty$ .
3. Compute the $p$ -quantile of residuals for both groups.
4. The true $\alpha$ is the value that minimizes the difference between these two conditional quantiles.
Theoretical Guarantee: Under Lipschitz continuity of the noise distribution and identifiability assumptions, this estimator achieves an error bound of $\tilde{O}(1/\sqrt{n})$ , matching the optimal rate for parametric estimation despite the censoring.

B. Dual Optimization for Budgets

To handle the budget constraint, the algorithm uses a Lagrangian Dual approach.

It introduces a dual variable $\lambda_t$ (shadow price of the budget).
The effective value for bidding becomes $v_t / (1 + \lambda_t)$ . As the budget depletes, $\lambda_t$ increases, causing the bidder to "shade" their bid more aggressively (bid lower) to conserve funds.
$\lambda_t$ is updated via Online Gradient Descent based on the difference between the average spending rate and the target budget rate $\rho = B/T$ .

C. Phase-Based Learning Structure

The algorithm operates in phases to ensure statistical independence between parameter estimation and decision-making:

Exploration Phase ( $T_0$ ): The bidder bids 0 to observe raw competitor bids and obtain an initial estimate of $\alpha$ .
Commit Phases: The remaining time is divided into pairs of intervals $(A_i, B_i)$ $(A_{i}, B_{i})$ :
- Interval $A_i$ : Used to update the estimate of $\alpha$ using the robust quantile method.
- Interval $B_i$ : Used to update reward/cost estimators and select bids based on the current $\hat{\alpha}$ and $\lambda_t$ .
- This separation prevents the "self-censoring" bias from corrupting the parameter estimates used in the same round.

D. Multi-Dimensional Extension

The framework is extended to high-dimensional contexts ( $\alpha \in \mathbb{R}^d$ ) using a Component-Wise Estimator (Algorithm 3), estimating each dimension of $\alpha$ independently using the same quantile-invariance principle.

3. Key Contributions

Novel Setting: This is the first work to simultaneously address contextual competitors, budget constraints, and one-sided feedback in repeated first-price auctions. Previous works typically assumed i.i.d. competitor bids or full information feedback.
Robust Estimation Technique: The development of a quantile-based regression method that recovers linear parameters from censored, bid-dependent data without knowing the noise distribution $G$ . This technique is of independent interest for other censored learning problems.
Optimal Regret Bound: The proposed algorithm achieves a regret bound of $\tilde{O}(\sqrt{T})$ (order-optimal) for the 1D case and $\tilde{O}(\sqrt{dT})$ for the multi-dimensional case.
Removal of Distributional Assumptions: Unlike prior contextual auction literature (e.g., Badanidiyuru et al., 2023) which assumed the noise distribution was known, this work operates under the realistic assumption that both the structural parameter $\alpha$ and the noise distribution $G$ are unknown.

4. Results

Theoretical:
- Theorem 1: Proves the estimation error of the quantile-based estimator is $\tilde{O}(1/\sqrt{n})$ .
- Theorem 2: Proves the main algorithm achieves $\tilde{O}(\sqrt{T})$ regret with high probability.
- Theorem 3: Extends the result to multi-dimensional contexts with $\tilde{O}(\sqrt{dT})$ regret.
Empirical:
- Numerical experiments were conducted with $T=5000$ and various noise distributions (Normal, Log-normal, Uniform).
- The proposed contextual algorithm (Alg1) significantly outperformed a non-contextual baseline (Alg2), demonstrating that leveraging context to model competitor behavior is crucial for maximizing surplus under budget constraints.

5. Significance

This paper bridges a critical gap between theoretical auction design and the realities of modern digital advertising (e.g., Google Ad Manager, AppNexus).

Practical Relevance: It addresses the industry shift from Second-Price to First-Price auctions, where strategic bid shading is necessary but difficult without full feedback.
Realistic Constraints: By incorporating budget constraints and one-sided feedback, the model reflects the actual operational environment of Demand-Side Platforms (DSPs).
Methodological Impact: The "Conditional Quantile Invariance" method provides a new tool for learning in environments with censored, policy-dependent data, applicable beyond auctions to areas like dynamic pricing and resource allocation.

In summary, the paper provides a mathematically rigorous and practically viable solution for adaptive bidding in complex, data-scarce auction environments, achieving optimal learning rates despite severe information limitations.