Pretraining in Actor-Critic Reinforcement Learning for Robot Locomotion
This paper proposes a pretraining-finetuning paradigm for robot locomotion that leverages a task-agnostic exploration strategy to train a Proprioceptive Inverse Dynamics Model (PIDM), which is then used to warm-start actor-critic algorithms like PPO, resulting in significant improvements in sample efficiency and task performance across diverse robot embodiments.