Act-Observe-Rewrite: Multimodal Coding Agents as In-Context Policy Learners for Robot Manipulation
This paper introduces Act-Observe-Rewrite (AOR), a framework enabling multimodal language models to iteratively improve robot manipulation policies by synthesizing and rewriting executable Python controller code based on visual feedback and failure analysis, achieving high success rates across tasks without demonstrations, reward engineering, or gradient updates.