Invariance-Based Dynamic Regret Minimization
The paper proposes ISD-linUCB, an algorithm for stochastic non-stationary linear bandits that leverages the decomposition of reward models into stationary and non-stationary components to learn invariances from historical data, thereby reducing problem dimensionality and significantly improving dynamic regret in fast-changing environments.