MoMaStage: Skill-State Graph Guided Planning and Closed-Loop Execution for Long-Horizon Indoor Mobile Manipulation
MoMaStage is a structured vision-language framework that enables robust long-horizon indoor mobile manipulation by guiding task planning through a topology-aware Skill-State Graph and ensuring execution reliability via a closed-loop mechanism that triggers semantic replanning upon detecting physical deviations, all without requiring explicit scene mapping.