Fixed-Budget Constrained Best Arm Identification in Grouped Bandits

Imagine you are a talent scout trying to find the single best band to headline a massive music festival. You have a limited amount of time and money (a "fixed budget") to audition hundreds of bands.

However, there's a twist: You can't just pick the band that sounds the loudest.

The Problem: The "All-or-Nothing" Rule

In this paper, the authors describe a scenario where every "band" (or Arm) isn't just one musician; it's a group of musicians (attributes) playing different instruments.

The Band: A car wash service that offers five different things: washing, waxing, tire shine, interior cleaning, and engine detailing.
The Rule: To be considered for the festival, the band must be good at everything. If the engine detailing is terrible, the whole band is disqualified, even if the waxing is world-class.
The Goal: Find the band that is good at everything AND has the best overall average performance.

This is tricky because you have to balance two competing fears:

The "Risky" Trap: Picking a band that sounds amazing overall but fails one specific instrument (e.g., great waxing, terrible tires).
The "Missed Opportunity" Trap: Ignoring a band that is actually perfect, just because one of its instruments sounded slightly off during a quick test.

The Solution: FCSR (The Smart Scout)

The authors propose a new algorithm called FCSR (Feasibility Constrained Successive Rejects). Think of it as a three-phase audition process designed to be both efficient and safe.

Phase 1: The "Quick Scan" (Uniform Sampling)

The scout gives every remaining band a quick, equal chance to play a little bit of every instrument. This is like a "sound check." It helps eliminate the bands that are clearly terrible at everything.

Phase 2: The "Stress Test" (APT Sampling)

Now, the scout focuses on the bands that are still in the running. But here's the trick: If a band's "tire shine" sounds a little shaky (close to the failure line), the scout spends extra time listening only to the tires to see if they are actually bad or just having a bad day.

Analogy: Imagine a judge tasting a soup. If the saltiness is borderline, they don't taste the pepper or the carrots; they keep tasting the salt until they are 100% sure if it's too salty or not.

Phase 3: The "Safety Net" (SAMPLEUNTILFEASIBLE)

This is the paper's secret sauce. Sometimes, a band might look great overall, but one specific instrument (say, the engine detailing) keeps sounding "meh" (below the threshold).

Old methods might give up on this band too quickly, thinking, "Eh, it's risky, let's move on."
FCSR says: "Wait! This band is the best overall. Let's give it a dedicated safety budget of extra time to fix that one shaky instrument." It keeps testing that specific weak spot until it's proven safe. If the band passes, it stays in the running. If the budget runs out and it's still shaky, then it gets kicked out.

Why is this a Big Deal?

Before this paper, algorithms were like a scout who only looked at the average score. They would pick the band with the highest average, even if one member was terrible. Or, they were too scared of risk and would eliminate the best band just because of one small doubt.

The authors proved mathematically that FCSR is the most efficient way to do this.

The Lower Bound: They proved there is a mathematical limit to how fast anyone can solve this problem. You can't do it faster than a certain speed without making mistakes.
The Match: They showed that FCSR hits that speed limit. It's as fast as mathematically possible (up to a constant factor).

Real-World Example: The Movie Portfolio

The paper tested this on a real-world scenario: Movie Portfolios.

Imagine you are a streaming service (like Netflix).
You want to create a "Bundle" of 5 movies to recommend to a user.
The Constraint: Every single movie in the bundle must have a rating above 3.6 stars (the threshold). You can't recommend a bundle with one terrible movie, even if the other four are masterpieces.
The Goal: Find the bundle where the average rating of all 5 movies is the highest possible.

Using the MovieLens dataset, FCSR found the best bundles much more accurately than older methods, especially when the budget (number of user ratings you can check) was small.

The Takeaway

In a world where we often have to make decisions based on multiple criteria (e.g., "Find the safest car that also gets the best gas mileage"), you can't just look at the average. You have to ensure every single part meets a minimum standard.

FCSR is the smart strategy that says: "Don't just guess. Test the weak spots until you're sure, but don't waste time on the parts that are already perfect." It's the difference between a hasty guess and a perfectly vetted decision.

Here is a detailed technical summary of the paper "Fixed-Budget Constrained Best Arm Identification in Grouped Bandits" by Raunak Mukherjee and Sharayu Moharir.

1. Problem Formulation

The paper addresses a specific variant of the Pure Exploration Multi-Armed Bandit (MAB) problem known as Fixed-Budget Best Arm Identification (FBBAI) within a Grouped Bandit setting, subject to feasibility constraints.

Setting: There are $K$ arms, where each arm $i$ consists of $M$ independent attributes (random variables).
Feasibility Constraint: An arm is considered feasible only if the mean reward of all its $M$ attributes exceeds a known threshold $\tau$ . If any attribute's mean falls below $\tau$ , the arm is infeasible.
Objective: The learner has a fixed budget $T$ (total number of samples). The goal is to identify the best feasible arm (the feasible arm with the highest overall mean reward, defined as the average of its attribute means) with the minimum probability of error.
Challenge: The problem involves a dual difficulty:
1. Mean Discrimination: Distinguishing the arm with the highest mean from sub-optimal feasible arms.
2. Feasibility Verification: Ensuring the selected arm does not violate the threshold constraint on any of its attributes.
3. Risk: Infeasible arms may have high overall means (due to some attributes being very high), tempting algorithms to select them if feasibility is not rigorously checked.

2. Methodology: Feasibility Constrained Successive Rejects (FCSR)

The authors propose FCSR, a novel hybrid sampling algorithm that combines existing strategies with a new heuristic to handle the constraints.

Core Components

Successive Rejects (SR) Framework:
- The algorithm proceeds in $K-1$ rounds. In each round, surviving arms are sampled, and the "worst" arm is eliminated based on a scoring rule.
- Scoring Rule: An arm $i$ is scored as its overall empirical mean $\hat{\mu}_i$ if it is currently deemed feasible (all attributes $> \tau$ ). If it is deemed infeasible, its score is the minimum empirical mean among its attributes. This ensures infeasible arms are penalized heavily.
Hybrid Sampling Strategy:
Within each round, the budget allocated to a surviving arm is split into three sequential phases:
- Uniform Phase: Samples are distributed uniformly across all attributes of the arm to gather baseline statistics.
- Risky (APT) Phase: Uses the APT (Adaptive Pure Exploration for Thresholding Bandits) algorithm. This phase focuses sampling on attributes whose empirical means are close to the threshold $\tau$ to efficiently determine feasibility.
- Feasibility Phase (SAMPLEUNTILFEASIBLE - SUF): This is the novel contribution. If an attribute of the best candidate arm is empirically infeasible (mean $\le \tau$ ), the algorithm dedicates a specific portion of the budget to sample only that attribute until it is empirically confirmed as feasible or the budget is exhausted. This prevents the best arm from being prematurely eliminated due to statistical noise on a single attribute.
Budget Management:
- A fraction $f$ of the total budget is reserved specifically for feasibility checks (distributed as a "feasibility budget" $P_i$ per arm).
- If an arm is eliminated, its unused feasibility budget is transferred to a shared "extra" pool to be redistributed among remaining arms, ensuring efficient resource utilization.

3. Key Contributions

A. Theoretical Lower Bound

The authors derive a fundamental lower bound on the error probability for any algorithm in this setting.

They define a new complexity parameter $H_{FC}$ $H_{F C}$ (Feasibility Constrained Hardness) which is the maximum of three components:
1. $H_{R}^2$ : Hardness related to distinguishing the best arm from risky (infeasible but high-mean) arms.
2. $H_{tbp}$ : Hardness related to the Thresholding Bandit Problem (distinguishing attributes near the threshold).
3. $H_{f}$ : Hardness related to verifying the feasibility of the best arm.
The lower bound shows that the error probability decays exponentially with the budget $T$ , scaled by $H_{FC}$ .

B. Optimality of FCSR

The authors prove an upper bound on the error probability of FCSR.
The upper bound matches the lower bound up to constant factors in the exponent. This establishes that FCSR is minimax optimal for this problem setting.
Crucially, the paper demonstrates that the SAMPLEUNTILFEASIBLE (SUF) subroutine is necessary to achieve this optimality. Standard APT sampling alone would result in a sub-optimal error rate (scaling poorly with $K$ ) when verifying the feasibility of the best arm.

C. Parameter-Free Design

FCSR is entirely parameter-free regarding the problem instance (e.g., it does not require prior knowledge of the gaps between means or the threshold gaps). It only requires the threshold $\tau$ and the budget $T$ .

4. Experimental Results

The authors evaluated FCSR against natural baselines (Uniform Sampling, Successive Rejects, and Explore-then-Commit) on both synthetic and real-world datasets.

Synthetic Experiments:
- Risky Instances: In scenarios where infeasible arms have high overall means, FCSR significantly outperformed baselines, which often selected the infeasible arm.
- Feasibility Instances: In scenarios where the best arm has one attribute barely above the threshold, FCSR successfully verified feasibility without misidentifying sub-optimal "safe" arms.
- Combined Instances: FCSR maintained superior performance across complex scenarios combining mean discrimination and feasibility checks.
Real-World Data (MovieLens):
- The problem was modeled as selecting a "movie portfolio" (arm) where every genre (attribute) must meet a quality threshold.
- FCSR achieved higher accuracy than baselines under low-budget regimes ( $T=500, 1000$ ), demonstrating practical viability.

5. Significance and Impact

Bridging a Gap: Prior work focused either on unconstrained BAI or fixed-confidence constrained settings. This paper solves the fixed-budget constrained problem, which is critical for applications with strict resource limits (e.g., limited ad spend, finite testing phases).
Handling Multi-Dimensionality: It addresses the complexity of grouped arms where all attributes must satisfy a constraint, a common scenario in service quality (e.g., a car garage must excel in all services, not just the average) and content curation.
Algorithmic Innovation: The introduction of the SUF heuristic provides a blueprint for how to allocate budget specifically to "fix" potential constraint violations without wasting resources on already-safe attributes.
Theoretical Rigor: By establishing matching lower and upper bounds, the paper provides a definitive answer to the sample complexity required for this class of problems.

In summary, this paper provides a theoretically optimal and practically effective solution for identifying the best option in a constrained, multi-attribute environment under a fixed budget, ensuring that the selected option is not only high-performing but also strictly compliant with quality standards.