CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal
CARE (Contrastive Anchored REflection) is a failure-centric post-training framework for multimodal reasoning that enhances Group-relative Reinforcement Learning with Verifiable Rewards (RLVR) by leveraging an anchored-contrastive objective and Reflection-Guided Resampling to transform erroneous rollouts into effective supervision signals, thereby significantly improving accuracy and training stability on visual-reasoning benchmarks.