Optimal Best-Arm Identification under Fixed Confidence with Multiple Optima

Imagine you are a detective trying to find the best suspect in a lineup of $K$ people. However, there's a twist: you don't know if there is just one best suspect, or if there are several suspects who are equally guilty (and equally "best").

Your goal is to identify any one of these "best" suspects with high confidence, but you want to do it by asking as few questions as possible. Every question you ask costs time and money (this is called "sample complexity").

This paper is about solving that detective story when you have a secret piece of information: You know exactly how many "best" suspects there are.

Here is the breakdown of the paper's story, using simple analogies:

1. The Problem: The "Tie" Dilemma

In the past, most detective stories assumed there was only one true winner. Algorithms were built to keep asking questions until they were 100% sure who the single winner was.

But in real life, ties happen.

Example: Imagine you are testing 10 different flavors of ice cream. Maybe three of them are all tied for "Best."
The Old Way: If you didn't know there were three winners, your algorithm would keep tasting the top three flavors over and over again, trying to figure out which one is slightly better than the others. This is a waste of time! You just need to find any of the three winners.
The Gap: Previous research figured out how to handle ties when you didn't know how many winners there were. But nobody had figured out the mathematically perfect way to do it when you do know the number of winners in advance.

2. The New Discovery: The "Tighter" Lower Bound

The author, Lan V. Truong, asks: "If I tell you there are exactly 3 winners, can we do better than if I just said 'there are some winners'?"

The Answer is Yes.

The paper derives a new Information-Theoretic Lower Bound.

The Metaphor: Think of this as the "Speed Limit" for your investigation.
The Old Speed Limit: "You must ask at least 1,000 questions to be sure."
The New Speed Limit: "Because you know there are exactly 3 winners, you only need to ask 800 questions."

The paper proves that knowing the number of winners allows you to stop the investigation earlier. It mathematically calculates the absolute minimum number of questions needed, which is strictly lower (better) than previous methods.

3. The Solution: A Smarter Detective (Track-and-Stop)

The paper proposes a modified version of a famous algorithm called Track-and-Stop.

How it works:
1. Tracking: The detective keeps a running tally of who looks like a winner.
2. The "Tie-Aware" Twist: Because the detective knows there are $M$ winners, it stops wasting energy trying to rank the winners against each other. Instead, it focuses its energy on proving that the current top group is definitely better than the "losers."
3. Stopping: It uses a special "Stop Sign" (a statistical rule). As soon as the evidence is strong enough to say, "These $M$ people are the best, and everyone else is worse," it stops. It doesn't care which of the $M$ is the absolute best; it just picks one and says, "Done!"

4. Why This Matters (The "So What?")

This isn't just about ice cream or detectives. This logic applies to:

Clinical Trials: If three different drugs work equally well, you don't need to run expensive tests to see which one is slightly better. You just need to confirm they are all better than the placebo and pick one. This saves millions of dollars and time.
A/B Testing: If you are testing website designs and find three that perform equally well, you can stop testing immediately and launch any of them.
Hyperparameter Tuning: In AI, if you find three settings that give the same best result, you can stop searching and use one.

Summary in One Sentence

This paper proves that if you know how many "winners" exist in a competition, you can design a smarter strategy to find one of them much faster and cheaper than if you were guessing how many there were.

The Takeaway: Knowledge of the "tie count" is a superpower that lets you stop the game earlier and win with fewer moves.

Here is a detailed technical summary of the paper "Optimal Best-Arm Identification under Fixed Confidence with Multiple Optima" by Lan V. Truong.

1. Problem Statement

The paper addresses the Best-Arm Identification (BAI) problem within the stochastic Multi-Armed Bandit (MAB) framework under a fixed-confidence setting.

Goal: Identify any arm with the maximal expected reward (an optimal arm) with a probability of at least $1-\delta$, while minimizing the expected number of samples (sample complexity).
Key Distinction: Unlike most existing literature that assumes a unique best arm, this work focuses on scenarios where multiple optimal arms exist (i.e., a set $A^\star$ of size $M > 1$ where all arms in the set share the same maximal mean $\mu^\star$ ).
Specific Setting: The paper investigates the case where the number of optimal arms ( $M$ ) is known in advance. This contrasts with prior work (e.g., Degenne and Koolen [1]) which handled the case where $M$ is unknown.
Challenge: Standard algorithms often waste samples trying to distinguish between arms that are statistically equivalent (ties). The challenge is to design an algorithm that stops as soon as any arm from the optimal set is confidently identified, without unnecessary comparisons among the tied optimal arms.

2. Methodology

The paper employs an information-theoretic approach combined with a modified Track-and-Stop algorithm.

A. One-Parameter Exponential Family

The analysis assumes arm rewards follow distributions from a one-parameter exponential family (e.g., Bernoulli, Gaussian with known variance, Poisson). The KL divergence between distributions is characterized using the log-partition function $A(\theta)$ .

B. New Information-Theoretic Lower Bound

The authors derive a fundamental lower bound on the sample complexity for the case where $M$ is known.

Alternative Set: They define an alternative set of bandit models $Alt(\mu)$ where a specific non-optimal arm $a$ has a mean strictly greater than all $M$ optimal arms.
Optimization Problem: The lower bound $T^*(\mu)$ is derived by solving a minimax optimization problem over sampling proportions $w \in \Sigma_K$ (the probability simplex).
$T^*(\mu)^{-1} = \sup_{w \in \Sigma_K} \min_{a \notin [M]} \left( \sum_{i=1}^M w_i + w_a \right) \times I_{\dots}(\mu_1, \dots, \mu_M, \mu_a)$
Here, $I$ represents a specific information term involving the KL divergence between the optimal arms and the competing arm, weighted by the sampling proportions.
Result: This new bound is strictly tighter than the bound derived for the unknown- $M$ setting, demonstrating that knowing the cardinality of the optimal set reduces the theoretical minimum sample complexity.

C. Modified Track-and-Stop Algorithm

The authors propose a modification to the classic Track-and-Stop algorithm to achieve this bound:

Sampling Rule (Tracking): Uses either C-Tracking (projection onto a truncated simplex) or D-Tracking (forced exploration when counts are low) to ensure the empirical sampling proportions converge to the optimal allocation $w^*(\mu)$ .
Stopping Rule (Tie-Aware): The core innovation is a generalized log-likelihood ratio (GLLR) statistic, denoted as $Z_{a; b_1, \dots, b_M}(t)$ $Z_{a; b_{1}, \dots, b_{M}} (t)$ .
- Instead of comparing a single arm against a single "best" candidate, the statistic compares a candidate arm $a$ against a set of $M$ candidate optimal arms $\{b_1, \dots, b_M\}$ .
- The stopping time $\tau$ is the first time $t$ where the maximum of these statistics over all possible sets of $M$ arms exceeds a threshold $\beta(t, \delta)$ .
- The threshold $\beta(t, \delta)$ is carefully tuned (involving terms like $t^{(M+1)/2}$ ) to ensure the $\delta$ -PAC guarantee.
Recommendation: Upon stopping, the algorithm selects one arm uniformly at random from the identified set of $M$ optimal arms.

3. Key Contributions

Tighter Fundamental Limit: Derivation of a new, strictly tighter information-theoretic lower bound for BAI when the number of optimal arms ( $M$ ) is known. This quantifies the sample complexity savings gained by knowing $M$ .
Tie-Aware Algorithm: Proposal of a modified Track-and-Stop algorithm with a specific stopping rule designed to handle multiple optima. The rule leverages the known cardinality $M$ to avoid distinguishing between tied optimal arms.
Instance-Optimality Guarantee: Rigorous proof that the proposed algorithm achieves asymptotic instance-optimality. As the confidence parameter $\delta \to 0$ , the expected sample complexity of the algorithm matches the new lower bound $T^*(\mu)$ .
Theoretical Completion: This work completes the theoretical picture for fixed-confidence BAI, bridging the gap between the unique-best-arm case ( $M=1$ ) and the unknown-cardinality multi-optimal case.

4. Results and Analysis

Asymptotic Optimality: The paper proves (Theorem 8 and Theorem 9) that:
$\limsup_{\delta \to 0} \frac{\mathbb{E}[\tau]}{\log(1/\delta)} \leq T^*(\mu)$
This confirms the algorithm is asymptotically optimal, matching the derived lower bound.
Sample Complexity Reduction: The analysis shows that knowing $M$ allows the algorithm to allocate samples more efficiently. Specifically, the algorithm does not need to resolve the "tie" between optimal arms, which reduces the required KL divergence accumulation compared to the unknown- $M$ case.
Specific Cases: The paper provides explicit calculations for Gaussian bandits, showing the sample complexity scales as $\Theta(1/\Delta^2)$ , where $\Delta$ is the gap between optimal and sub-optimal arms.

5. Significance

Practical Relevance: Many real-world applications (clinical trials, A/B testing, recommendation systems) naturally involve multiple equally good options. This work provides the first formal guarantee for efficiently identifying any of these options when their count is known.
Algorithmic Design: It demonstrates that structural knowledge (knowing $M$ ) can be explicitly exploited in the stopping rule to improve efficiency, moving beyond "one-size-fits-all" approaches.
Theoretical Foundation: By establishing the first instance-optimal algorithm for the known- $M$ multi-optimal setting, the paper sets a new benchmark for future research in structured bandit problems and combinatorial settings.

In summary, this paper resolves a long-standing theoretical gap by proving that knowing the number of optimal arms allows for a strictly more efficient identification strategy, and it provides the specific algorithmic machinery (a tie-aware Track-and-Stop) to achieve this efficiency.