Localizing and Correcting Errors for LLM-based Planners

This paper proposes Localized In-Context Learning (L-ICL), a technique that iteratively augments instructions with targeted corrections for specific failing steps, significantly improving the ability of large language models to generate valid plans in symbolic classical planning tasks compared to traditional methods.

Aditya Kumar, William W. Cohen

Published 2026-03-09
📖 5 min read🧠 Deep dive

Imagine you hire a brilliant but slightly clumsy architect to design a path through a maze. This architect (the Large Language Model, or LLM) is incredibly smart; they can write poetry, solve complex math problems, and understand human language better than anyone else. However, when it comes to following strict physical rules—like "you can't walk through walls" or "you can't push a box into a corner where it gets stuck"—they often make silly mistakes. They might suggest walking straight through a brick wall or pushing a box into a dead end, even though you told them not to.

This paper introduces a new way to teach this architect how to follow the rules, called L-ICL (Localized In-Context Learning).

The Problem: The "Whole Story" vs. The "Specific Mistake"

Traditionally, if you wanted to teach an AI how to navigate a maze, you would show it a few examples of perfect, complete solutions. You'd say, "Look, here is a person who started at the door and walked all the way to the treasure without hitting any walls."

The problem is that the AI gets overwhelmed. It sees the whole story but misses the tiny, specific rules that made the story work. It's like trying to learn how to drive a car by watching a 2-hour movie of a perfect road trip. You see the destination, but you don't learn exactly why the driver didn't hit that specific pothole or why they stopped at that specific red light. The AI tries to guess the rules, and it often guesses wrong, walking through walls or getting stuck.

The Solution: The "Spot-Check" Method (L-ICL)

The authors realized that instead of showing the AI the whole movie, they should just show it the exact moment it made a mistake and correct it immediately.

Think of it like a video game coach or a software debugger:

  1. The Attempt: The AI tries to plan a path.
  2. The Glitch: It suggests a move that breaks a rule (e.g., "Move East" into a wall).
  3. The Localized Fix: Instead of restarting the whole game, the coach pauses right there and says, "Hey, look at this specific spot. If you are at (3,4) and there is a wall to the East, you cannot move East. You can only move North or South."
  4. The Lesson: The AI adds this tiny, specific rule to its memory and tries again.

This is L-ICL. It doesn't show the AI the whole journey; it shows the AI one specific "doctest" (a tiny example of input and correct output) for the exact step where it failed.

Why This is a Game-Changer

The paper compares this "Spot-Check" method to the old "Whole Story" method (called RAG-ICL) and found some amazing results:

  • Efficiency is King: The "Whole Story" method needed to show the AI 20,000 characters of text (a whole novel's worth of examples) to get a 9% success rate. The "Spot-Check" method (L-ICL) achieved an 89% success rate using only 2,000 characters of text.

    • Analogy: It's like teaching someone to bake a cake. The old way is to hand them a 500-page cookbook. The new way is to hand them a single sticky note that says, "If the oven is at 350, don't put the cake in for 10 minutes, or it burns." The sticky note is tiny, but it fixes the biggest problem.
  • It Works Everywhere: They tested this on mazes, block-stacking puzzles (Sokoban), and even complex 3D-like block worlds. In almost every case, the AI got much better at following the rules.

  • It Doesn't Need a Map: Surprisingly, the AI didn't even need to see a picture of the maze (the ASCII grid) to learn. It learned the rules just by seeing the "Spot-Check" examples. It learned the logic of the walls, not just the look of the walls.

The "Unit Test" Analogy

The authors use a great analogy from software engineering.

  • Traditional Learning (Showing full paths) is like running an End-to-End Test. You run the whole program to see if it works. If it fails, you don't know which line of code broke.
  • L-ICL is like a Unit Test. You test one tiny function at a time. "Does this specific button work?" "Yes." "Does this specific wall block movement?" "Yes." By hardening these tiny individual steps, the whole system becomes reliable.

The Limitation: Knowing the Rules vs. Being Smart

There is one catch. L-ICL makes the AI very good at following the rules (not walking through walls). But it doesn't necessarily make the AI a genius strategist.

  • Analogy: L-ICL teaches the AI how to drive without hitting other cars. But it doesn't necessarily teach the AI how to win a race or find the fastest route through traffic.
  • In complex puzzles (like Sokoban), the AI learned not to push boxes into corners (a rule), but it still sometimes struggled to plan the best sequence of moves to win the game. It became a rule-follower, but not always a master planner.

The Bottom Line

This paper solves a major headache for AI researchers: Why do smart AIs keep breaking simple rules?

The answer is that they were being taught with too much noise (whole stories) instead of clear, targeted feedback. By switching to L-ICL—which acts like a strict teacher correcting a student's specific math error on the spot rather than re-teaching the whole chapter—we can make AI planners significantly more reliable, using far less data and time.

It turns a clumsy, rule-breaking AI into a disciplined one, one tiny correction at a time.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →