Act-Observe-Rewrite: Multimodal Coding Agents as In-Context Policy Learners for Robot Manipulation

This paper introduces Act-Observe-Rewrite (AOR), a framework enabling multimodal language models to iteratively improve robot manipulation policies by synthesizing and rewriting executable Python controller code based on visual feedback and failure analysis, achieving high success rates across tasks without demonstrations, reward engineering, or gradient updates.

Vaishak Kumar

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are teaching a robot to build a tower of blocks. In the old days, you had two main options:

  1. The "Drill Sergeant" Method: You show the robot a video of a human doing it perfectly 10,000 times. The robot memorizes the movements by rote. If you change the lighting or the table, the robot gets confused and fails.
  2. The "Trial and Error" Method: You let the robot try, fail, and try again, but you have to manually tweak its "brain" (its neural network) after every mistake. This takes a long time and requires a supercomputer.

This paper introduces a third way: The "Self-Taught Programmer."

The authors call this Act–Observe–Rewrite (AOR). Here is how it works, explained with a simple story and some analogies.

The Story: The Robot and the "Ghost Writer"

Imagine a robot arm (the Actor) trying to pick up a red cube and put it on a table.

  1. Act: The robot tries to do the task. It moves its arm, grabs the cube, and places it.
  2. Observe: The robot fails. Maybe it missed the cube, or it dropped it. But instead of just saying "I failed," a special AI assistant (a Multimodal LLM) watches the video of the failure. It looks at the robot's code (the instructions telling the robot how to move) and the video of the mistake side-by-side.
  3. Rewrite: The AI assistant doesn't just say, "Try harder." It acts like a Ghost Writer for the robot's brain. It opens the robot's source code, finds the exact line causing the error, and rewrites it.

Then, the robot compiles this new code and tries again. No human touched the code. No massive training data was used. The robot literally wrote a better version of itself after every failure.

The Magic Analogy: The "Code vs. The Dial"

To understand why this is special, imagine a radio.

  • Old Methods (Parameter Tuning): Imagine the robot is a radio with a dial. If the signal is bad, you just turn the dial slightly left or right. You are guessing. If the radio is broken because the antenna is in the wrong place, turning the dial won't help.
  • This Paper's Method (Code Rewriting): Imagine the robot is a radio, but the AI assistant is an engineer. If the signal is bad, the engineer doesn't just turn a dial. They open the radio, look at the circuit board, and realize, "Ah, the antenna is wired backward!" They solder a new wire (rewrite the code) to fix the root cause.

Why is this a big deal?
Most robot learning methods only "turn the dial." They tweak numbers. But this paper shows that if you let the AI rewrite the actual instructions, it can fix deep, structural problems that numbers can't solve.

The Three Challenges They Solved

The researchers tested this on three tasks, and the "Ghost Writer" solved them in very human-like ways:

  1. The "Wrong Map" Problem (Lift Task):

    • The Mistake: The robot kept hovering 8 inches above the cube. It thought the cube was floating in the air.
    • The Old Way: You'd have to guess, "Maybe the camera is too sensitive?"
    • The AOR Way: The AI looked at the code and the video. It said, "The camera uses a different coordinate system (like a map where North is Down). The code is reading the map upside down." It rewrote the math to flip the image. Success.
  2. The "Colorblind" Problem (Pick & Place Can):

    • The Mistake: The robot was looking for a "silver" soda can, but in the camera's view, the can looked red. The robot couldn't find it.
    • The AOR Way: The AI saw the robot staring at empty space. It looked at the code: search for silver. It realized, "Wait, the lighting makes it look red!" It changed the code to search for red. Success.
  3. The "Clumsy Hand" Problem (Stack Task):

    • The Mistake: The robot could pick up the first cube, but when it tried to stack a second one, its fingers kept bumping the bottom cube and knocking it over.
    • The AOR Way: The AI saw the bump in the video. It rewrote the approach path to be more careful. It got the robot to 91% success.
    • The Limit: The robot eventually got stuck. It knew why it was failing (it was bumping the block), but it couldn't figure out how to move its fingers differently to avoid it. It hit a wall. This shows the system isn't perfect yet, but it's honest about its limits.

The "No-Go" Zone (What Makes This Unique)

Usually, to get a robot to learn, you need:

  • Thousands of videos of humans doing the task.
  • Complex math to reward the robot for good moves.
  • Supercomputers to train the brain.

AOR needs none of that.

  • No Videos: It learns from its own mistakes.
  • No Rewards: It doesn't need a score; it just needs to see the code didn't work.
  • No Training: It doesn't "learn" in the background; it literally rewrites its own software between attempts.

The Bottom Line

This paper proposes a new way to build robots: Don't train the robot to be smart; give it a smart editor that rewrites its own manual.

It's like giving a robot a "Ctrl+Z" (Undo) button, but instead of just undoing the move, it rewrites the instructions so the mistake never happens again. It turns robot learning from a "black box" (where we don't know why it works) into a transparent process where we can read the code and say, "Ah, that's why it failed, and here is the fix."

While it's not perfect yet (it sometimes gets stuck on very tricky physical problems), it proves that robots can learn to fix their own code just by watching themselves fail and thinking about it.