Dr. Seg: Revisiting GRPO Training for Visual Large Language Models through Perception-Oriented Design
Dr. Seg challenges the assumption that language-based GRPO training transfers seamlessly to visual perception by introducing a plug-and-play framework with a Look-to-Confirm mechanism and Distribution-Ranked Reward module that significantly enhances performance in complex visual scenarios without requiring architectural modifications.