Active Advantage-Aligned Online Reinforcement Learning with Offline Data

This paper introduces A3RL, a novel framework that integrates offline and online reinforcement learning through a confidence-aware active advantage-aligned sampling strategy to dynamically prioritize high-value data, thereby overcoming challenges like catastrophic forgetting and improving sample efficiency to outperform existing methods.

Xuefeng Liu, Hung T. C. Le, Siyu Chen, Rick Stevens, Zhuoran Yang, Matthew R. Walter, Yuxin Chen2026-03-10🤖 cs.LG

Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative

This paper introduces Texts as Time Series (TaTS), a novel framework that leverages the periodic alignment between paired texts and time series data to enhance multimodal forecasting and imputation performance in existing numerical-only models without requiring architectural changes.

Zihao Li, Xiao Lin, Zhining Liu, Jiaru Zou, Ziwei Wu, Lecheng Zheng, Dongqi Fu, Yada Zhu, Hendrik Hamann, Hanghang Tong, Jingrui He2026-03-10🤖 cs.LG

IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models

The paper proposes IMPACT, a novel motion planning framework that leverages Vision-Language Models to infer environment semantics and generate anisotropic cost maps, enabling a contact-aware A* planner to safely navigate cluttered environments by distinguishing between acceptable and dangerous object contacts.

Yiyang Ling, Karan Owalekar, Oluwatobiloba Adesanya, Erdem Bıyık, Daniel Seita2026-03-10🤖 cs.LG

More Bang for the Buck: Process Reward Modeling with Entropy-Driven Uncertainty

The paper introduces EDU-PRM, an entropy-driven process reward model that automatically identifies reasoning step boundaries using predictive entropy to eliminate manual annotations, achieving state-of-the-art performance with only 1.5% of the training data while significantly improving accuracy and reducing token usage.

Lang Cao, Renhong Chen, Yingtian Zou, Chao Peng, Huacong Xu, Yuxian Wang, Wu Ning, Qian Chen, Mofan Peng, Zijie Chen, Peishuo Su, Yitong Li2026-03-10🤖 cs.LG

A Champion-level Vision-based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7

This paper introduces a vision-based reinforcement learning agent that achieves champion-level performance in Gran Turismo 7 by utilizing an asymmetric actor-critic framework to rely solely on ego-centric camera views and onboard sensors, thereby eliminating the need for external global localization while outperforming the game's built-in drivers.

Hojoon Lee, Takuma Seno, Jun Jet Tai, Kaushik Subramanian, Kenta Kawamoto, Peter Stone, Peter R. Wurman2026-03-10🤖 cs.LG

StablePCA: Distributionally Robust Learning of Shared Representations from Multi-Source Data

This paper introduces StablePCA, a distributionally robust framework for extracting shared low-dimensional representations from multi-source data by maximizing worst-case explained variance, and addresses its inherent nonconvexity through a convex relaxation solved by an efficient Mirror-Prox algorithm with global convergence guarantees and a data-dependent certificate for solution tightness.

Zhenyu Wang, Molei Liu, Jing Lei, Francis Bach, Zijian Guo2026-03-10🤖 cs.LG