Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation

This paper introduces Overlap-Adaptive Regularization (OAR), a novel method that enhances the performance of existing CATE meta-learners in low-overlap regions by proportionally increasing regularization based on overlap weights, while offering flexible, debiased variants that preserve Neyman-orthogonality for robust inference.

Valentyn Melnychuk, Dennis Frauen, Jonas Schweisthal, Stefan FeuerriegelTue, 10 Ma🤖 cs.LG

Online Decision-Focused Learning

This paper introduces the first provably convergent online algorithms for decision-focused learning in dynamic environments by regularizing non-differentiable objectives and employing perturbation techniques to handle non-convexity, thereby establishing static and dynamic regret bounds and demonstrating superior performance over standard benchmarks.

Aymeric Capitaine, Maxime Haddouche, Eric Moulines, Michael I. Jordan, Etienne Boursier, Alain DurmusTue, 10 Ma🤖 cs.LG

Active Advantage-Aligned Online Reinforcement Learning with Offline Data

This paper introduces A3RL, a novel framework that integrates offline and online reinforcement learning through a confidence-aware active advantage-aligned sampling strategy to dynamically prioritize high-value data, thereby overcoming challenges like catastrophic forgetting and improving sample efficiency to outperform existing methods.

Xuefeng Liu, Hung T. C. Le, Siyu Chen, Rick Stevens, Zhuoran Yang, Matthew R. Walter, Yuxin ChenTue, 10 Ma🤖 cs.LG

Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling

This paper proposes a novel Variational Learning framework for Gaussian Process Latent Variable Models that utilizes Stochastic Gradient Annealed Importance Sampling to overcome proposal distribution challenges in high-dimensional spaces, achieving tighter variational bounds and superior performance compared to state-of-the-art methods.

Jian Xu, Shian Du, Junmei Yang, Qianli Ma, Delu Zeng, John PaisleyTue, 10 Ma🤖 cs.LG

Posterior Sampling Reinforcement Learning with Gaussian Processes for Continuous Control: Sublinear Regret Bounds for Unbounded State Spaces

This paper establishes a tight Bayesian regret bound of O~(H3/2γT/HT)\widetilde{\mathcal{O}}(H^{3/2}\sqrt{\gamma_{T/H} T}) for Gaussian Process Posterior Sampling Reinforcement Learning in continuous control with unbounded state spaces by proving that visited states remain within a near-constant radius and applying the chaining method to control regret.

Hamish Flynn, Joe Watson, Ingmar Posner, Jan PetersTue, 10 Ma🤖 cs.LG