Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning
This paper introduces Directional Decoupling Alignment (D-Align), a novel framework that mitigates Preference Mode Collapse in diffusion reinforcement learning by applying directional corrections to reward signals, thereby preserving generative diversity while achieving superior human preference alignment.