Fixed-Budget Constrained Best Arm Identification in Grouped Bandits

This paper addresses the fixed-budget best-arm identification problem in grouped bandits with feasibility constraints by establishing a theoretical lower bound on error probability and proposing the Feasibility Constrained Successive Rejects (FCSR) algorithm, which achieves optimal performance guarantees while empirically outperforming natural baselines.

Raunak Mukherjee, Sharayu Moharir

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine you are a talent scout trying to find the single best band to headline a massive music festival. You have a limited amount of time and money (a "fixed budget") to audition hundreds of bands.

However, there's a twist: You can't just pick the band that sounds the loudest.

The Problem: The "All-or-Nothing" Rule

In this paper, the authors describe a scenario where every "band" (or Arm) isn't just one musician; it's a group of musicians (attributes) playing different instruments.

  • The Band: A car wash service that offers five different things: washing, waxing, tire shine, interior cleaning, and engine detailing.
  • The Rule: To be considered for the festival, the band must be good at everything. If the engine detailing is terrible, the whole band is disqualified, even if the waxing is world-class.
  • The Goal: Find the band that is good at everything AND has the best overall average performance.

This is tricky because you have to balance two competing fears:

  1. The "Risky" Trap: Picking a band that sounds amazing overall but fails one specific instrument (e.g., great waxing, terrible tires).
  2. The "Missed Opportunity" Trap: Ignoring a band that is actually perfect, just because one of its instruments sounded slightly off during a quick test.

The Solution: FCSR (The Smart Scout)

The authors propose a new algorithm called FCSR (Feasibility Constrained Successive Rejects). Think of it as a three-phase audition process designed to be both efficient and safe.

Phase 1: The "Quick Scan" (Uniform Sampling)

The scout gives every remaining band a quick, equal chance to play a little bit of every instrument. This is like a "sound check." It helps eliminate the bands that are clearly terrible at everything.

Phase 2: The "Stress Test" (APT Sampling)

Now, the scout focuses on the bands that are still in the running. But here's the trick: If a band's "tire shine" sounds a little shaky (close to the failure line), the scout spends extra time listening only to the tires to see if they are actually bad or just having a bad day.

  • Analogy: Imagine a judge tasting a soup. If the saltiness is borderline, they don't taste the pepper or the carrots; they keep tasting the salt until they are 100% sure if it's too salty or not.

Phase 3: The "Safety Net" (SAMPLEUNTILFEASIBLE)

This is the paper's secret sauce. Sometimes, a band might look great overall, but one specific instrument (say, the engine detailing) keeps sounding "meh" (below the threshold).

  • Old methods might give up on this band too quickly, thinking, "Eh, it's risky, let's move on."
  • FCSR says: "Wait! This band is the best overall. Let's give it a dedicated safety budget of extra time to fix that one shaky instrument." It keeps testing that specific weak spot until it's proven safe. If the band passes, it stays in the running. If the budget runs out and it's still shaky, then it gets kicked out.

Why is this a Big Deal?

Before this paper, algorithms were like a scout who only looked at the average score. They would pick the band with the highest average, even if one member was terrible. Or, they were too scared of risk and would eliminate the best band just because of one small doubt.

The authors proved mathematically that FCSR is the most efficient way to do this.

  • The Lower Bound: They proved there is a mathematical limit to how fast anyone can solve this problem. You can't do it faster than a certain speed without making mistakes.
  • The Match: They showed that FCSR hits that speed limit. It's as fast as mathematically possible (up to a constant factor).

Real-World Example: The Movie Portfolio

The paper tested this on a real-world scenario: Movie Portfolios.

  • Imagine you are a streaming service (like Netflix).
  • You want to create a "Bundle" of 5 movies to recommend to a user.
  • The Constraint: Every single movie in the bundle must have a rating above 3.6 stars (the threshold). You can't recommend a bundle with one terrible movie, even if the other four are masterpieces.
  • The Goal: Find the bundle where the average rating of all 5 movies is the highest possible.

Using the MovieLens dataset, FCSR found the best bundles much more accurately than older methods, especially when the budget (number of user ratings you can check) was small.

The Takeaway

In a world where we often have to make decisions based on multiple criteria (e.g., "Find the safest car that also gets the best gas mileage"), you can't just look at the average. You have to ensure every single part meets a minimum standard.

FCSR is the smart strategy that says: "Don't just guess. Test the weak spots until you're sure, but don't waste time on the parts that are already perfect." It's the difference between a hasty guess and a perfectly vetted decision.