Thinking with Gaze: Sequential Eye-Tracking as Visual Reasoning Supervision for Medical VLMs
This paper introduces a method that enhances medical Vision-Language Models by using sequential eye-tracking data as supervision to train dedicated gaze tokens, enabling the models to mimic radiologists' visual search patterns and achieve state-of-the-art performance in both in-domain and out-of-domain medical reasoning tasks.