PrismAudio: Decomposed Chain-of-Thoughts and Multi-dimensional Rewards for Video-to-Audio Generation
PrismAudio is a novel video-to-audio generation framework that addresses objective entanglement and human preference alignment by integrating a decomposed Chain-of-Thought reasoning structure with multi-dimensional rewards and a computationally efficient Fast-GRPO algorithm, achieving state-of-the-art performance across semantic, temporal, aesthetic, and spatial dimensions.