Model Space Reasoning as Search in Feedback Space for Planning Domain Generation

This paper proposes an agentic language model framework that generates high-quality planning domains from natural language descriptions by treating model space reasoning as a search process in feedback space, utilizing symbolic feedback such as landmarks and VAL plan validator outputs to iteratively optimize domain quality.

James Oswald, Daniel Oblinsky, Volodymyr Varha, Vasilije Dragovic, Harsha Kokel, Kavitha Srinivas, Michael Katz, Shirin Sohrabi

Published 2026-04-13
📖 5 min read🧠 Deep dive

Imagine you are trying to teach a very smart, but slightly confused, robot how to play a new board game. You describe the rules to the robot in plain English: "You can move your piece forward if the square is empty," or "If you land on a red square, you lose."

The robot (an AI) tries to write down the official rulebook (called a PDDL domain) based on your description. But because the robot is still learning, it often makes mistakes. Maybe it forgets a rule, or it invents a rule that doesn't make sense, like "You can jump over walls." If you just let the robot write the rulebook once and stop, the game will be broken.

This paper is about a new way to help the robot fix its rulebook until it's perfect. The authors call their method "Model Space Reasoning as Search in Feedback Space." That's a mouthful, so let's break it down with some analogies.

The Problem: The Robot's First Draft is Messy

When Large Language Models (LLMs) try to turn your English description into a strict computer rulebook, they often get the syntax right (the words are spelled correctly) but the semantics wrong (the logic is broken). It's like a student who writes a perfect essay but gets the math wrong.

The Solution: The "Feedback Loop"

Instead of just asking the robot to "try again," the researchers give it specific clues about what is wrong. They use two main types of clues:

  1. The "Landmark" Clue (The GPS Waypoints):
    Imagine you are giving the robot directions to a treasure. You tell it, "You must pass the old oak tree before you reach the cave."

    • In the paper: These are called Landmarks. They are critical steps that must happen in any valid plan. If the robot's rulebook allows a path that skips the oak tree, the system says, "Hey! You missed a mandatory stop!"
    • Analogy: It's like a GPS telling you, "You missed a turn; you can't get to the destination without passing this specific intersection."
  2. The "Plan Validator" Clue (The Trial Run):
    Imagine you ask the robot to actually play a round of the game using its new rulebook.

    • In the paper: They use a tool called VAL to try and run a plan. If the plan crashes (e.g., the robot tries to move a piece that doesn't exist), the system says, "This move is illegal because your rules are wrong."
    • Analogy: It's like a test drive. If the car stalls, the mechanic knows something is wrong with the engine, not just the driver.

The Secret Sauce: "Search in Feedback Space"

Here is the clever part. The robot doesn't just get one clue and fix it. It gets many possible clues.

Imagine the robot is in a dark room trying to find the light switch.

  • Random Walk (The Old Way): The robot just picks a random wall, pokes it, and if it's not the switch, it picks another random wall. This is slow and inefficient.
  • Heuristic Search (The New Way): The robot uses a "smart compass." It looks at all the possible clues (feedback messages) it could receive. It asks, "Which clue is most likely to get me closer to the perfect rulebook?" It picks the best clue, fixes the rulebook, and repeats.

The researchers tested this by letting the AI try different combinations:

  • Just Landmarks.
  • Just Plan Validation.
  • Both together.
  • Randomly picking clues vs. using the "smart compass" to pick the best clues.

The Results: What Did They Find?

  1. Feedback is Magic: Giving the robot clues (feedback) made the rulebooks much better than just letting it guess once.
  2. The "Smart Compass" Works: Using a search strategy to pick the best clues generally worked better than just picking clues at random.
  3. It's Not One-Size-Fits-All: Sometimes "Landmarks" were the best clue; sometimes "Plan Validation" was better. It depends on the specific game (domain) the robot is trying to learn.
  4. The Winner: The best combination was using both types of clues and using the smart search to pick the best ones. With this method, they were able to generate a perfect rulebook (100% correct) for every single game they tested.

Why Does This Matter?

Currently, making these computer rulebooks requires expensive human experts who know complex coding languages. This paper shows that we can use AI to do the heavy lifting, as long as we give it the right kind of "corrections" and a smart way to choose which corrections to listen to.

In short: They taught an AI to write perfect game rules by letting it play, checking where it failed, and using a smart strategy to figure out exactly which rule to fix next. It's like having a tireless, super-smart editor who knows exactly how to turn a messy draft into a masterpiece.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →