From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning
This paper identifies "Lazy Attention Localization" as a key bottleneck in multimodal cold-start training, where models fail to increase visual attention, and proposes the Attention-Guided Visual Anchoring and Reflection (AVAR) framework to effectively reshape attention distributions, achieving a 7.0% performance gain on multimodal reasoning benchmarks.