Sharper Guarantees for Misspecified Kernelized Bandit Optimization

This paper proves that the misspecification penalty in kernelized bandit optimization — both offline (simple regret) and online (cumulative regret) — can be reduced from a square-root-of-complexity factor to a logarithmic or polylogarithmic one, by exploiting spectral localization in the offline setting and spatial domain-splitting in the online setting.

Original authors: Davide Maran, Csaba Szepesvári

Published 2026-05-08✓ Author reviewed
📖 4 min read☕ Coffee break read

Original authors: Davide Maran, Csaba Szepesvári

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Picture: The "Mountain in the Clouds" Problem

Imagine you are an explorer flying in a helicopter above a vast mountain range. Your job is simple: find the single highest peak. The catch is that the entire mountain range is wrapped in thick clouds — while you are flying, you cannot really see the mountains at all.

What you can do is point at any spot on the map and tell the pilot to fly there. Once you arrive, you take a single height measurement at that spot, which refines your map a little. Then you point at the next spot, fly there, measure, and so on. Every measurement costs time and fuel, so you cannot just measure everywhere.

The one thing you know in advance — and this matters — is that the mountain range is not too jagged: heights vary smoothly across the map, up to some bounded misspecification error. (This is the plain-language version of the paper's spectral / RKHS regularity assumption: the true height function lives in a structured family, but only approximately.)

This is the setup for kernelized bandit optimization. The "helicopter explorer" is the algorithm, the "mountain" is the unknown function the algorithm is trying to maximize, and each "flight + measurement" is one query of the function.

Two Ways to Get Paid

The paper studies two different ways of judging how good the explorer is.

Offline scenario — paid for the final guess

You are given a fixed budget of helicopter trips. You take all your measurements, build the best map you can, and at the very end you point at one spot — your final guess for where the highest peak is.

You are paid by how much you miss the tallest peak:

  • true height of the actual highest peak − height of your final guess = your miss
  • smaller miss = better pay.

This is what the paper calls simple regret. The earlier measurements only matter to the extent that they help your final guess.

Online scenario — paid for every trip

Now imagine the explorer is paid round by round. Every time they fly somewhere, the height of that spot goes into a running total. After all the trips, the running total is compared to the total they would have collected if they had known the location of the tallest peak from the very beginning and simply flown there every single time.

The gap between those two totals is the cumulative regret: how much height the explorer left on the table over the whole exploration.

The online problem is harder, because every poorly chosen measurement directly hurts your pay — you cannot "spend" some early flights purely on exploration with no consequence.

What the Paper Actually Does

For decades, theory has said: when your map model is misspecified (the true mountain is only approximately the kind of function the model expects, with some error ε\varepsilon), this misspecification gets amplified by a factor that grows with the complexity of the kernel. In offline guarantees that factor was deff\sqrt{d_\mathrm{eff}} (the kernel's effective dimension); in online guarantees it was γn\sqrt{\gamma_n} (the maximum information gain after nn rounds).

This paper proves that for a large class of kernels, that amplification can be reduced from a square-root-of-complexity dependence down to a logarithmic or polylogarithmic one. In other words: the cost of being slightly wrong about the model grows much more slowly than people thought.

The trick is localization:

  • Spectral localization in the offline setting controls a quantity called the Lebesgue constant of the approximation operator, which is what really governs how badly misspecification hurts you. The paper proves logarithmic amplification for one-dimensional monotone spectra and polylogarithmic amplification for multivariate Fourier-diagonal product kernels.
  • Domain splitting in the online setting is the spatial counterpart: chop the map into regions, run the algorithm on each, and prevent local errors from being amplified globally. This removes the extra γn\sqrt{\gamma_n} factor from the online misspecification term, giving a cumulative regret bound of O~(γnn+nε)\widetilde{\mathcal O}(\sqrt{\gamma_n n} + n\varepsilon).

The One-Sentence Punchline

By being more careful about where the misspecification error can compound — spectrally for the offline problem, spatially for the online one — the explorer's pay (in both regret notions) becomes far more robust to small modelling errors than previous results suggested.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →