CMMR-VLN: Vision-and-Language Navigation via Continual Multimodal Memory Retrieval
The paper proposes CMMR-VLN, a vision-and-language navigation framework that enhances large language model agents with structured multimodal memory retrieval and reflection-based updates to selectively leverage prior experiences, significantly improving performance in long-horizon and unfamiliar scenarios compared to existing methods.