Active Advantage-Aligned Online Reinforcement Learning with Offline Data
This paper introduces A3RL, a novel framework that integrates offline and online reinforcement learning through a confidence-aware active advantage-aligned sampling strategy to dynamically prioritize high-value data, thereby overcoming challenges like catastrophic forgetting and improving sample efficiency to outperform existing methods.