Compose by Focus: Scene Graph-based Atomic Skills
This paper introduces a scene graph-based framework that enhances the compositional generalization of generalist robots by learning robust, focused atomic skills via graph neural networks and diffusion models, which are then orchestrated by a vision-language model planner to achieve superior performance in complex, long-horizon tasks.