GenePlan: Evolving Better Generalized PDDL Plans using Large Language Models

GenePlan is a novel framework that leverages large language models to evolve domain-dependent generalized planners in Python, achieving state-of-the-art performance on PDDL benchmarks while significantly outperforming other LLM-based approaches in both accuracy and efficiency.

Andrew Murray, Danial Dervovic, Alberto Pozanco, Michael Cashmore

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are trying to teach a very smart, but slightly chaotic, robot how to play a complex board game. You don't want to write the rules for every single possible game setup (which would take forever). Instead, you want the robot to learn a general strategy that works for any setup of that game.

This is exactly what the paper GenePlan is about. It's a new method for teaching Large Language Models (LLMs)—like the AI behind ChatGPT or Claude—to become expert game masters for "PDDL" planning problems (a formal language used to describe logic puzzles, robot tasks, and logistics).

Here is the breakdown of how it works, using some everyday analogies.

1. The Problem: The "Lazy" Genius

Current AI models are like brilliant students who can write a great essay on a specific topic if you give them a prompt. But when you ask them to solve a logic puzzle (like moving blocks, delivering newspapers, or organizing a warehouse), they often get stuck. They might:

  • Make up rules that don't exist.
  • Get lost in the middle of the plan.
  • Give you a solution that works but takes 100 steps when it could be done in 10.

They are "satisficing"—they find a solution, not the best one.

2. The Solution: GenePlan (The Evolutionary Coach)

The authors created GenePlan. Think of GenePlan not as a single teacher, but as a coach running a training camp for a team of AI students.

Instead of asking the AI to "write the perfect plan" once, GenePlan sets up an evolutionary tournament. Here is how the camp works:

Step 1: The Initial Drafts (The "Seed" Population)

The coach asks the AI to write a few different Python code scripts (strategies) to solve the puzzle. Some are terrible, some are okay, and maybe one is decent.

  • Analogy: Imagine asking 10 people to draw a map to a treasure. Most maps are wrong, but one is close.

Step 2: The Test Run (Fitness Evaluation)

The coach takes these 10 maps and tests them on 5 or 10 different versions of the treasure hunt.

  • If a map leads to a dead end, it gets a "failure" score.
  • If a map finds the treasure but takes a long, winding path, it gets a "slow" score.
  • If a map finds the treasure quickly, it gets a "gold star."

Step 3: The "Survival of the Fittest" (Evolution)

This is the magic part. The coach doesn't just pick the winner. The coach takes the best maps and mixes them together.

  • Crossover: Imagine taking the "turn left at the oak tree" part from Map A and the "cross the river at the bridge" part from Map B to create a new, super Map C.
  • Mutation: The coach makes tiny, random tweaks to Map C (e.g., "What if we skip the bridge and swim?"). Maybe this new idea is even better!

Step 4: The Loop

The coach discards the worst maps, keeps the new "hybrid" maps, and asks the AI to refine them again. This happens over and over (generations).

  • The Twist: The AI isn't just guessing; it's being told, "Hey, your last map failed here because you forgot the bridge. Try to fix that."

3. The Result: A Master Strategist

After a few hours of this "training camp," GenePlan produces a single, highly optimized Python script.

  • It's Fast: Once this script is written, it can solve new puzzles in less than half a second.
  • It's Cheap: The whole process costs about $1.82 per domain (a specific type of puzzle).
  • It's Smart: In tests, this evolved AI performed just as well as the world's best traditional planning software (which has been refined for decades), but it did it by learning the strategy rather than being hard-coded.

Why is this a big deal?

Think of it like this:

  • Old Way: You hire a human to write a specific instruction manual for every single new warehouse layout.
  • GenePlan Way: You hire a coach to train a robot to invent its own instruction manual that works for any warehouse layout, and then the robot gets faster and smarter every time it tries.

The "Gotcha"

The paper admits that this doesn't work for every problem. If a puzzle is like Sokoban (a game where you push boxes and can get stuck in a corner with no way out), a simple "general strategy" doesn't exist. In those cases, the AI tries to build a complex search engine (like a GPS recalculating the route every second), which is slower. But for most standard logistics and planning problems, GenePlan is a game-changer.

Summary

GenePlan is a framework that uses Large Language Models as a "breeding ground" for code. It treats planning as a game of evolution: generate many strategies, test them, keep the best ones, mix them, and repeat. The result is a fast, cheap, and highly intelligent planner that can solve complex logic puzzles better than standard AI prompts and almost as well as the most powerful traditional computers.