Let's Reward Step-by-Step: Step-Aware Contrastive Alignment for Vision-Language Navigation in Continuous Environments
This paper introduces Step-Aware Contrastive Alignment (SACA), a novel framework that enhances Vision-Language Navigation in Continuous Environments by utilizing a perception-grounded auditor to extract dense, step-level supervision from imperfect trajectories, thereby overcoming the limitations of compounding errors in supervised fine-tuning and sparse rewards in reinforcement fine-tuning to achieve state-of-the-art performance.