Restoring Linguistic Grounding in VLA Models via Train-Free Attention Recalibration
This paper identifies a "linguistic blindness" failure mode in Vision-Language-Action (VLA) models where they ignore contradictory instructions in favor of visual priors, and proposes IGAR, a train-free attention recalibration method that effectively restores language grounding and prevents erroneous actions without requiring model retraining.