Reward-Conditioned Reinforcement Learning
This paper introduces Reward-Conditioned Reinforcement Learning (RCRL), a framework that trains a single agent to optimize a family of reward specifications from a shared off-policy dataset, enabling robust and efficient adaptation to changing task preferences without sacrificing the simplicity of single-task training.