ReMoT: Reinforcement Learning with Motion Contrast Triplets
This paper introduces ReMoT, a unified training paradigm that combines a rule-based framework for generating a large-scale motion-contrast dataset with Group Relative Policy Optimization to significantly enhance VLMs' spatio-temporal consistency and reasoning capabilities, achieving state-of-the-art performance on both new and standard benchmarks.