ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval
ExpReS-VLA is a specialized Vision-Language-Action model that enables rapid, memory-efficient on-device adaptation to specific robotic tasks by combining compressed experience replay, retrieval-augmented generation, and a novel contrastive loss to prevent catastrophic forgetting while significantly improving performance on both spatial and long-horizon benchmarks.