Stability and Robustness via Regularization: Bandit Inference via Regularized Stochastic Mirror Descent
This paper establishes a general stability criterion for stochastic mirror descent algorithms to enable valid statistical inference in adaptive bandit settings, introducing regularized-EXP3 variants that simultaneously achieve minimax-optimal regret, nominal confidence interval coverage, and robustness to adversarial corruptions.