GameVerse: Can Vision-Language Models Learn from Video-based Reflection?
The paper introduces GameVerse, a comprehensive benchmark featuring a novel reflect-and-retry paradigm and a hierarchical taxonomy across 15 games, demonstrating that Vision-Language Models can effectively improve their gameplay policies through video-based reflection by combining failure trajectories with expert tutorials.