A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation
The paper introduces A-3PO, a method that accelerates asynchronous LLM training by 1.8x by approximating the computationally expensive proximal policy in Decoupled PPO through simple interpolation, thereby eliminating the need for extra forward passes while maintaining comparable performance.