Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
This paper proposes VC-STaR, a novel self-improving framework that leverages visual contrastive pairs to mitigate hallucinations in model-generated rationales, resulting in the VisCoR-55K dataset that significantly enhances the visual reasoning capabilities of Vision Language Models.