On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

Imagine you are a treasure hunter in a vast, shifting desert. You have a limited amount of time (a "budget") to find the single best spot to dig for gold. However, there's a catch: the ground itself is changing. The value of the gold at any specific spot fluctuates wildly from hour to hour, and you don't know the pattern. This is the world of Non-Stationary Linear Bandits.

In this paper, the authors (Leo, Zhihan, Kevin, and Maryam) tackle a classic problem: How do you find the best spot as quickly and accurately as possible when the map keeps changing?

Here is the breakdown of their discovery, using simple analogies.

1. The Old Way: The "Brute Force" Map

Previously, researchers had a standard strategy for this problem. They treated every possible digging spot as completely unrelated to the others.

The Analogy: Imagine you have 100 different keys, and you don't know which one opens the treasure chest. The old strategy says, "Treat every key as unique. You have to test them all individually to be sure."
The Problem: This approach is incredibly pessimistic. It assumes the worst-case scenario where the keys have no relationship. In reality, many keys look similar or fit similar locks. By ignoring these similarities, the old method wastes time and makes more mistakes. It's like trying to find the best route through a city by testing every single street individually, ignoring the fact that some streets are just short cuts of others.

2. The New Insight: "Neighbors" Matter

The authors realized that in a "linear" setting (where spots are connected by geometry), the best spot isn't just fighting against everyone; it's only really fighting against its neighbors.

The Analogy: Imagine a hill with many peaks. You want to find the highest peak.
- The Old View: You think you have to compare your current peak against every other peak in the entire mountain range to be sure it's the highest.
- The New View (Adjacency): The authors realized that if you are standing on a peak and you are higher than the two or three peaks immediately touching your base (your "adjacent" neighbors), you are automatically the highest peak in the whole range. You don't need to check the distant mountains; the geometry guarantees it.

They call this concept Adjacency. If you can prove you are better than your immediate neighbors, you win the game.

3. The Solution: The "Adjacent-Optimal" Strategy

Based on this insight, they created a new algorithm called Adjacent-BAI.

The Old Strategy: "Let's sample every single spot evenly, just in case." (Like tasting every dish on a menu to find the best one).
The New Strategy: "Let's focus our energy only on comparing the dishes that are next to each other on the menu."
- They designed a new way to sample the environment that specifically targets the "edges" of the map where the best spots compete.
- They ignore the comparisons between distant, unrelated spots because those comparisons are mathematically unnecessary.

4. The Result: A Sharper, Faster Compass

The paper proves two major things:

The Lower Bound (The Limit): They proved mathematically that you cannot do better than comparing neighbors. No matter how smart your algorithm is, the difficulty of the problem is determined by how hard it is to distinguish between these immediate neighbors.
The Upper Bound (The Achievement): They showed that their new "Adjacent-BAI" algorithm actually achieves this perfect limit. It is as efficient as theoretically possible.

Why does this matter?
In the old "Brute Force" method, the difficulty of the problem grew linearly with the number of dimensions (the complexity of the map). If you added more variables, the problem got much harder.
With the new "Adjacent" method, if your map has a nice, dense structure (like a circle of points), the difficulty drops significantly. It's like realizing that in a crowded room, you only need to talk to the people standing right next to you to find the loudest voice, rather than shouting across the whole room.

Summary

The Problem: Finding the best option in a changing world is hard.
The Mistake: Previous methods treated every option as a stranger, ignoring that some options are "neighbors."
The Breakthrough: You only need to beat your neighbors to be the winner.
The Outcome: A new algorithm that ignores the noise of distant options and focuses purely on the local battles, making it much faster and more accurate.

In short, the authors taught us that in a complex, changing world, you don't need to know everything about the whole map; you just need to know your neighbors.

Here is a detailed technical summary of the paper "On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits" by Maynard-Zhang et al.

1. Problem Statement

The paper addresses the Fixed-Budget Best-Arm Identification (BAI) problem within the framework of Non-Stationary Linear Bandits.

Setting: A learner interacts with a finite set of arms $\mathcal{X} \subset \mathbb{R}^d$ over a fixed time horizon $T$ .
Non-Stationarity: The environment is adversarial. At each round $t$ , an unknown parameter vector $\theta_t$ is chosen by an adversary. The reward for pulling arm $x_t$ is $r_t = x_t^\top \theta_t + \epsilon_t$ , where $\epsilon_t$ is sub-Gaussian noise.
Objective: The learner must identify the arm $x^*$ that maximizes the cumulative reward (hindsight best arm) with high probability:
$x^* = \arg\max_{x \in \mathcal{X}} x^\top \left( \sum_{t=1}^T \theta_t \right)$
Let $\theta_T = \frac{1}{T}\sum_{t=1}^T \theta_t$ . The goal is to find $x^* = \arg\max_{x \in \mathcal{X}} x^\top \theta_T$ .
Challenge: Existing algorithms for stationary linear bandits (e.g., UCB, Thompson Sampling) or non-stationary multi-armed bandits fail or are suboptimal here. Specifically, previous minimax-optimal bounds for non-stationary linear BAI rely on the G-optimal design, yielding an error probability of $\exp(-\Theta(T/H_G))$ , where $H_G \propto d$ . This bound is derived from worst-case instances (standard basis vectors) and ignores the geometric structure of richer arm sets, leading to overly pessimistic complexity estimates.

2. Core Methodology and Key Concepts

The authors introduce a new geometric concept called Adjacency to refine the complexity measure of the problem.

A. Adjacency Lemma (Lemma 1)

The central theoretical insight is that for a polytope $P = \text{conv}(\mathcal{X})$ , the optimal arm must be an extreme point (vertex). The Adjacency Lemma states:

For any vertex $x$ , if there exists an arm $y$ such that $(y-x)^\top \theta > 0$ (i.e., $y$ is better than $x$ ), then there must exist an adjacent vertex $z$ (connected to $x$ by an edge of the polytope) such that $(z-x)^\top \theta > 0$ .

Implication: To identify the best arm, it is necessary and sufficient to accurately distinguish between adjacent arms. One does not need to compare every pair of arms, only those connected by edges in the convex hull of the arm set.

B. Arm-Set-Dependent Complexity Measure

Based on the adjacency concept, the authors define a new complexity measure, $H_{\text{Adjacent}}(\mathcal{X}, \Delta^{(1)})$ , which replaces the dimension-dependent $H_G$ :
$H_{\text{Adjacent}}(\mathcal{X}, \Delta^{(1)}) := \min_{\lambda \in \Delta_{\mathcal{X}}} \max_{(x, x') \in \mathcal{I}} \frac{\|x - x'\|_{A(\lambda)^{-1}}^2}{(\Delta^{(1)})^2}$
Where:

$\Delta^{(1)}$ is the minimum gap between the best and second-best arm.
$\mathcal{I}$ is the set of all adjacent pairs of extreme points.
$A(\lambda) = \sum \lambda_x x x^\top$ is the design matrix.
$\| \cdot \|_{A^{-1}}$ is the Mahalanobis norm.

This measure strictly refines the minimax bound $H_G$ . For dense arm sets (e.g., points on a circle), $H_{\text{Adjacent}}$ can be arbitrarily smaller than $H_G$ because the distance between adjacent arms shrinks, reducing the variance required to distinguish them.

3. Key Contributions

1. Lower Bound (Theorem 1)

The authors establish the first arm-set-dependent lower bound for non-stationary linear BAI.

Result: For any algorithm, there exist parameter sequences such that the error probability is at least:
$\exp\left( -\frac{4T}{H_{\text{Adjacent}}(\mathcal{X}, \Delta^{(1)})} \right)$
Proof Technique: They construct hard instances by splitting the time horizon $T$ into two halves. The first half minimizes the Kullback-Leibler (KL) divergence between two hypotheses (where the best arm is $x$ vs. $x'$ ), and the second half ensures the gap constraints are met. Crucially, they prove that the hardest instances to distinguish involve adjacent arms, allowing the optimization to focus only on $\mathcal{I}$ rather than all pairs $\mathcal{Y}$ .

2. Upper Bound and Algorithm (Adjacent-BAI)

They propose the Adjacent-BAI algorithm, which achieves an error probability matching the lower bound up to constants.

Adjacent-Optimal Design: Instead of the standard XY-optimal design (which minimizes variance over all pairs), they compute a design $\lambda^*$ that minimizes the maximum variance over adjacent pairs only:
$\lambda^* = \arg\min_{\lambda \in \Delta_{\mathcal{X}}} \max_{(x, x') \in \mathcal{I}} \|x - x'\|_{A(\lambda)^{-1}}^2$
Algorithm Steps:
1. Compute the set of adjacent pairs $\mathcal{I}$ (via convex hull computation).
2. Compute the Adjacent-optimal design $\lambda^*$ .
3. Use a rounding procedure (Pukelsheim) to generate a static allocation of arms $\{x_t\}$ that approximates $\lambda^*$ .
4. Play arms in a random permutation to ensure unbiased estimation.
5. Output the arm with the highest estimated cumulative reward using a least-squares estimator.
Result: The error probability is bounded by:
$P(\hat{x} \neq x^*) \leq |I_{x^*}| \cdot \exp\left( -\frac{T}{36 \cdot H_{\text{Adjacent}}(\mathcal{X}, \Delta^{(1)})} \right)$
This verifies the tightness of the lower bound.

4. Results and Significance

Tight Complexity Characterization: The paper proves that the complexity of non-stationary linear BAI is governed by the geometry of the adjacent pairs of the arm set, not the dimension $d$ alone.
Refinement of Minimax Bounds: The work demonstrates that the previously accepted minimax bound ( $H_G \propto d$ ) is overly pessimistic for structured arm sets. For example, in a 2D setting with $K$ arms uniformly distributed on a circle, the complexity scales with the inverse square of the arc length between neighbors (which goes to 0 as $K \to \infty$ ), whereas the minimax bound remains constant relative to $d$ .
Algorithmic Efficiency: The proposed Adjacent-BAI algorithm is computationally efficient (polynomial in $K$ and $d$ ) and achieves optimal sample complexity for the given arm set geometry.
Broader Implications: The authors suggest that the concept of adjacency may also resolve open problems in the stationary fixed-budget setting, where currently no arm-set-dependent lower bounds exist. They show that even in stationary settings, the instance-optimal complexity is determined by adjacent arms (Proposition 1).

Conclusion

This paper fundamentally shifts the understanding of Best-Arm Identification in non-stationary linear bandits. By leveraging the geometric property of adjacency, the authors derive a tighter, arm-set-dependent complexity measure and provide an algorithm that is optimal with respect to this new measure. This resolves the gap between theoretical lower bounds and practical performance for structured arm sets, moving beyond the limitations of dimension-dependent minimax bounds.

On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

1. The Old Way: The "Brute Force" Map

2. The New Insight: "Neighbors" Matter

3. The Solution: The "Adjacent-Optimal" Strategy

4. The Result: A Sharper, Faster Compass

Summary

1. Problem Statement

2. Core Methodology and Key Concepts

A. Adjacency Lemma (Lemma 1)

B. Arm-Set-Dependent Complexity Measure

3. Key Contributions

1. Lower Bound (Theorem 1)

2. Upper Bound and Algorithm (Adjacent-BAI)

4. Results and Significance

Conclusion

More like this

Modeling extremal dependence in multivariate and spatial problems: a practical perspective

Identifying Treatment Effect Heterogeneity with Bayesian Hierarchical Adjustable Random Partition in Adaptive Enrichment Trials

Comparative e-backtests for general risk measures

Estimating the distance at which narwhal (Monodon monoceros)(\textit{Monodon monoceros})(Monodon monoceros) respond to disturbance: a penalized threshold hidden Markov model

Either a Confidence Interval Covers, or It Doesn't (Or Does It?): A Model-Based View of Ex-Post Coverage Probability

Estimating the distance at which narwhal $(\textit{Monodon monoceros})$ respond to disturbance: a penalized threshold hidden Markov model