ViLAM: Distilling Vision-Language Reasoning into Attention Maps for Social Robot Navigation
ViLAM is a novel method that distills vision-language reasoning from large Vision-Language Models into spatial attention maps to guide socially compliant robot navigation, achieving significant improvements in success rates through real-world validation.