Annotation-Free Visual Reasoning for High-Resolution Large Multimodal Models via Reinforcement Learning
This paper proposes HART, an annotation-free framework that leverages a novel Advantage Preference Group Relative Policy Optimization (AP-GRPO) algorithm to enable Large Multimodal Models to autonomously identify and verify key high-resolution image regions, thereby improving reasoning performance without requiring costly human grounding labels.