On Regret Bounds of Thompson Sampling for Bayesian Optimization

Imagine you are trying to find the highest peak in a vast, foggy mountain range. You can't see the whole map, and every time you climb a spot to measure its height, it costs you a lot of time and energy (this is the "expensive-to-evaluate" part). This is the problem of Bayesian Optimization.

To solve this, you use a smart guide (an algorithm) that builds a mental map of the mountains based on the few spots you've already checked. Two of the most popular guides are GP-UCB (the cautious planner) and GP-TS (the adventurous explorer).

This paper is a deep dive into the adventurous explorer (GP-TS). The authors ask: "How good is this explorer really, and can we prove it mathematically?"

Here is the breakdown of their findings using simple analogies:

1. The Explorer's Weakness: The "Bad Luck" Scenario

The Problem:
The cautious planner (GP-UCB) has a safety net. If you ask it, "What's the chance I make a huge mistake?" it can guarantee the answer is very low, and that chance drops logarithmically (very slowly but steadily) as you get more data.

The adventurous explorer (GP-TS), however, is a bit more volatile. Previous math showed that if you want to guarantee a low chance of a huge mistake, the math gets messy. The "bad luck" probability seemed to drop much slower (polynomially).

The Discovery:
The authors built a specific "trap" scenario (a two-peak mountain where one is slightly higher but looks similar). They proved that if you get unlucky, GP-TS can get stuck climbing the wrong peak for a long time.

The Analogy: Imagine a gambler at a casino. GP-UCB is like a player who knows the odds and never loses more than a few dollars. GP-TS is like a player who usually wins big, but if they hit a specific "bad streak," they could lose a fortune. The paper proves that this "bad streak" isn't just a fluke; it's a real possibility where the loss grows much faster than the cautious planner's.
The Result: You can't promise GP-TS will always be safe in the same way you can with GP-UCB. It has a "heavy tail" of risk.

2. The "Double-Check" Safety Net

The Problem:
Since we know GP-TS can have those "bad streaks," how do we make the math safer?

The Discovery:
The authors didn't just look at the average performance; they looked at the variance (how much the results swing). They calculated the "second moment" (a fancy way of measuring the spread of outcomes).

The Analogy: Think of a weather forecast.
- Average Regret: "On average, it will rain 2 inches."
- Second Moment: "But sometimes it pours 10 inches, and sometimes it's dry."
  By measuring the "pouring" potential, the authors found a way to tighten the safety net. They proved that while GP-TS can have bad streaks, they aren't as catastrophic as previously feared.
The Result: They improved the math so that the "bad luck" probability drops faster than before. It's still not as perfect as the cautious planner, but it's much better than we thought.

3. The "Good Enough" Metric (Lenient Regret)

The Problem:
Sometimes, finding the absolute highest peak is impossible or takes too long. What if we just want to find a peak that is "good enough" (within 5% of the top)?

The Discovery:
The authors introduced a new way to measure success called Lenient Regret. Instead of punishing the explorer for every tiny step away from the top, they only count it as a "mistake" if the explorer is significantly far off.

The Analogy: Imagine a treasure hunt.
- Old Way: You get a point deducted for every inch you are away from the gold.
- Lenient Way: You only get a point deducted if you are in the wrong city. If you are in the right town, you're doing great!
The Result: They proved that GP-TS is incredibly efficient at finding "good enough" solutions. It finds a "good enough" peak very quickly, and the math shows this happens almost as fast as the best possible algorithm could. This is the first time this has been proven for GP-TS.

4. The Long-Term Marathon

The Problem:
How does the explorer perform over a very long time (a marathon, not a sprint)?

The Discovery:
The authors combined their new "Lenient" findings with some advanced math tricks to show that over a long period, GP-TS performs just as well as the cautious planner (GP-UCB).

The Analogy: In a 100-meter dash, the cautious planner might win because they never make a mistake. But in a 100-mile race, the adventurous explorer catches up and runs just as efficiently, provided the terrain (the math of the mountain) isn't too jagged.
The Result: They relaxed some of the strict rules about how "smooth" the mountain needs to be. This means GP-TS works well on a wider variety of real-world problems than previously thought.

Summary: What Does This Mean for You?

GP-TS is a great tool: It's not "flawed" just because it's riskier in extreme "bad luck" scenarios. In the real world, where we often just need a "good enough" solution, it is highly effective.
We now understand the risks: The paper gives us a better map of when GP-TS might struggle (the "bad luck" scenarios) and how to mathematically account for it.
Better guarantees: The authors have tightened the math, giving us more confidence to use GP-TS in critical applications like drug discovery or designing new materials, knowing exactly how it behaves under pressure.

In short, the paper says: "GP-TS is a brave explorer. It might get lost in a storm occasionally, but if you know how to read the weather (the new math), it's an excellent guide for finding the best solutions."

Here is a detailed technical summary of the paper "On Regret Bounds of Thompson Sampling for Bayesian Optimization" by Shion Takeno and Shogo Iwazaki.

1. Problem Statement

The paper addresses the theoretical analysis of Gaussian Process Thompson Sampling (GP-TS), a popular algorithm for Bayesian Optimization (BO). While GP-TS is widely used, its theoretical guarantees lag behind those of the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm.

Specifically, the authors identify four critical gaps in the existing literature regarding GP-TS under the Bayesian setting (where the objective function is a sample path from a Gaussian Process):

High-Probability Regret Dependence: Existing analyses of GP-TS only provide high-probability regret bounds that are a direct consequence of expected regret bounds (via Markov's inequality), resulting in a poor polynomial dependence on the failure probability $\delta$ (i.e., $O(1/\delta)$ ). In contrast, GP-UCB achieves a logarithmic dependence ( $O(\log(1/\delta))$ ). It is unclear if GP-TS can achieve better dependence.
Lenient Regret: High-probability bounds for "lenient regret" (counting only errors exceeding a tolerance $\Delta$ ) have been established for GP-UCB but not for GP-TS.
Improved Cumulative Regret: Recent improvements to the cumulative regret upper bound for GP-UCB (achieving $\tilde{O}(\sqrt{T})$ ) have not been successfully extended to GP-TS.
Kernel Conditions: Previous tight bounds for Matérn kernels required restrictive conditions on the smoothness parameter $\nu$ and dimension $d$ .

2. Methodology and Framework

The authors analyze GP-TS under the standard Bayesian BO framework:

Setting: Sequential decision-making to maximize an unknown black-box function $f: \mathcal{X} \to \mathbb{R}$ .
Assumptions: The objective function $f$ is a sample path from a zero-mean Gaussian Process with a kernel $k$ (Linear, Squared Exponential, or Matérn). Observations are noisy ( $y_t = f(x_t) + \epsilon_t$ ).
Algorithm: GP-TS selects $x_t = \arg\max_{x \in \mathcal{X}} g_t(x)$ , where $g_t$ is a sample path drawn from the posterior distribution $p(f | \mathcal{D}_{t-1})$ .
Key Analytical Tools:
- Maximum Information Gain (MIG): Used to quantify the complexity of the kernel.
- Second Moment Analysis: Deriving bounds on $E[R_T^2]$ to improve high-probability bounds.
- Lenient Regret Analysis: Using a novel proof technique inspired by the "elliptical potential count lemma" to bound the number of suboptimal actions.
- Refined Decomposition: Adapting recent GP-UCB analysis techniques (Iwazaki, 2025b) to decompose regret based on the distance to the global maximizer, relaxing conditions on kernel smoothness.

3. Key Contributions and Results

A. Regret Lower Bound (Impossibility of Logarithmic $\delta$ )

Theorem 3.1: The authors construct a specific two-armed bandit problem instance where GP-TS incurs a regret of $\Omega(T/2)$ with probability at least $c/T^c$ .

Implication: This proves that GP-TS cannot generally achieve an $O(\log(1/\delta))$ high-probability regret bound. The dependence on $\delta$ must be polynomial (specifically, at least $O(1/\delta^c)$ ), distinguishing it fundamentally from GP-UCB.

B. Improved High-Probability Regret via Second Moment

Theorem 3.2: The authors derive an upper bound on the second moment of the cumulative regret: $E[R_T^2] = O(T \gamma_T \log T)$ .

Result: By applying Markov's inequality to the second moment, they obtain a high-probability bound:
$R_T = O\left( \sqrt{\frac{T \gamma_T \log T}{\delta}} \right)$
Significance: This improves the dependence on $\delta$ from $O(1/\delta)$ to $O(1/\sqrt{\delta})$ , tightening the bound significantly compared to previous results.

C. Expected Lenient Regret Bounds

Theorem 3.3: The paper provides the first expected lenient regret upper bounds for GP-TS.

Result: The expected number of times the algorithm selects an action with regret $>\Delta$ is polylogarithmic in $T$ . Consequently, the expected lenient regret is also polylogarithmic.
Methodology: The proof uses a different technique than previous works (Cai et al., Iwazaki), leveraging properties of the posterior variance and the elliptical potential lemma. This approach suggests that similar bounds could be derived for GP-UCB.

D. Improved Cumulative Regret on Time Horizon $T$

Theorem 3.5: By adapting the refined analysis of GP-UCB (Iwazaki, 2025b) and combining it with their lenient regret results, the authors establish improved cumulative regret bounds for GP-TS.

Squared Exponential (SE) Kernels: $R_T = O(\sqrt{T} \log T)$ .
Matérn Kernels: $R_T = \tilde{O}(\sqrt{T})$ for $\nu > 2$ .
Key Relaxation: The analysis relaxes the condition on Matérn kernels from the previously required $2\nu + d \leq \nu^2 $(or similar restrictive conditions) to simply **$ \nu > 2$**. This matches the sufficient condition required for the existence of a unique global maximizer with a quadratic gap (Lemma 2.4).

4. Significance and Impact

Theoretical Closure: The paper bridges the theoretical gap between GP-UCB and GP-TS. It clarifies that while GP-TS cannot match the logarithmic $\delta$ -dependence of GP-UCB, it can achieve nearly optimal time-horizon ( $T$ ) dependence and significantly improved $\delta$ -dependence via second-moment analysis.
Practical Relevance: GP-TS is often preferred in practice because it does not require tuning the confidence width parameter $\beta_t$ (which is crucial for GP-UCB but difficult to set optimally). This paper provides the rigorous theoretical backing needed to trust GP-TS's empirical performance.
Methodological Advancement: The introduction of second-moment bounds for GP-TS and the novel proof technique for expected lenient regret offer new tools for analyzing other randomized BO algorithms.
Relaxed Kernel Conditions: The relaxation of the Matérn kernel condition to $\nu > 2$ broadens the applicability of tight regret bounds to a wider range of smooth functions commonly used in real-world BO applications.

5. Limitations and Future Work

The authors acknowledge several open problems:

Optimality of $\delta$ : While improved to $1/\sqrt{\delta}$, the tightness of this bound is not proven.
Variance Inflation: The analysis does not yet cover GP-TS with variance inflation (a technique used to fix the frequentist setting issues), which might be necessary to achieve $O(\sqrt{T \gamma_T \log(T/\delta)})$ bounds.
Matérn Smoothness: The condition $\nu > 2$ is still restrictive for common kernels like Matérn-1/2 or Matérn-3/2. Determining if this can be further relaxed remains an open question.
Extensions: The analysis is currently limited to vanilla BO; extending these results to multi-objective, constrained, or multi-fidelity settings is a future direction.

In summary, this paper significantly advances the theoretical understanding of GP-TS, providing tighter regret bounds, clarifying the limitations regarding probability dependence, and offering a more refined analysis of kernel smoothness requirements.

On Regret Bounds of Thompson Sampling for Bayesian Optimization

1. The Explorer's Weakness: The "Bad Luck" Scenario

2. The "Double-Check" Safety Net

3. The "Good Enough" Metric (Lenient Regret)

4. The Long-Term Marathon

Summary: What Does This Mean for You?

1. Problem Statement

2. Methodology and Framework

3. Key Contributions and Results

A. Regret Lower Bound (Impossibility of Logarithmic δ\deltaδ)

B. Improved High-Probability Regret via Second Moment

C. Expected Lenient Regret Bounds

D. Improved Cumulative Regret on Time Horizon TTT

4. Significance and Impact

5. Limitations and Future Work

More like this

Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning

Missingness Bias Calibration in Feature Attribution Explanations

Why Is RLHF Alignment Shallow? A Gradient Analysis

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent Planning

A. Regret Lower Bound (Impossibility of Logarithmic $\delta$ )

D. Improved Cumulative Regret on Time Horizon $T$