Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

This paper introduces Reflective Test-Time Planning, a framework that enhances embodied LLMs by integrating reflection-in-action, reflection-on-action, and retrospective reflection to transform repetitive trial-and-error into cumulative experience, thereby significantly improving long-horizon task performance and behavioral correction.

Yining Hong, Huang Huang, Manling Li, Li Fei-Fei, Jiajun Wu, Yejin Choi

Published 2026-02-25
📖 4 min read☕ Coffee break read

Imagine you are teaching a robot to clean a messy house. In the past, if you told the robot, "Put the toy car in the green box," and it tried to shove a giant teddy bear in there first, the robot would get stuck. It would say, "Oh no, the bear is in the way!" and then just try the exact same mistake again and again, forever. It had no "memory" of what went wrong, only a rigid set of instructions.

This paper introduces a new way to teach robots called Reflective Test-Time Planning. Think of it as giving the robot a "human-like brain" that can pause, think, learn from mistakes, and change its personality on the fly while it's working.

Here is how it works, broken down into three simple concepts:

1. The "Mental Sandbox" (Reflection-in-Action)

Before the robot actually moves its arm, it doesn't just pick the first idea that pops into its head. Instead, it runs a mental simulation.

  • The Analogy: Imagine you are packing for a trip. Instead of just shoving your biggest suitcase into the car trunk, you pause and think: "If I put the big suitcase here, will I be able to fit the golf clubs later? Maybe I should put the golf clubs in first."
  • How the Robot Does It: The robot generates several different ideas (e.g., "Put the car in the green box," "Put the car in the orange box," "Put the car on the shelf"). It uses a "judge" inside its brain to score each idea. It asks, "If I do this, will it work?" It picks the highest-scoring idea before it ever touches anything. This prevents it from making obvious mistakes right out of the gate.

2. The "Post-Game Review" (Reflection-on-Action)

Once the robot tries an action, it doesn't just move on. It stops and asks, "How did that actually go?"

  • The Analogy: Think of a coach watching a soccer player miss a penalty kick. The coach doesn't just say, "Okay, next time." The coach says, "You kicked too hard, and you aimed at the wrong corner. Next time, aim for the bottom left."
  • How the Robot Does It: After the robot tries to put an object in a box, it gets a "score" and a verbal explanation of why it succeeded or failed. It stores this lesson. If it tried to put a toy car in a box that was too small, it learns: "Oh, that box is too small. I won't try that again."

3. The "Hindsight Lookback" (Retro-Reflection)

This is the magic part. Sometimes, a mistake doesn't show up immediately. You might do something that seems fine at first, but it causes a disaster three steps later.

  • The Analogy: Imagine you are playing a video game. You pick up a shiny sword early on because it looks cool. Three levels later, you realize that sword is so heavy you can't jump over a wall, and you're stuck. A normal player would just keep trying to jump and fail. A reflective player looks back and says, "Wait, if I hadn't picked up that heavy sword, I would have made it. I need to change my strategy."
  • How the Robot Does It: The robot periodically looks back at its recent history. It asks, "Looking at where I am now, was that decision I made five minutes ago actually a good idea?" If the answer is "No," it re-evaluates that old decision and updates its brain to avoid that specific mistake in the future.

The Big Result: Learning While Doing

Most robots are like frozen statues: they are trained once, and then they just act out what they learned, even if they fail. If they fail, they fail the same way every time.

This new method turns the robot into a fluid learner. It is like a student taking a test who is allowed to:

  1. Think of three answers before writing one down.
  2. Check their work immediately after writing.
  3. Realize, halfway through the test, that they misunderstood the first question, and adjust their strategy for the rest of the exam.

In short: This paper teaches robots to stop repeating their mistakes. By giving them the ability to simulate, critique, and look back at their own actions while they are working, they can solve complex, messy real-world problems much better than before. They don't just "do"; they "learn how to do" as they go.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →