FactorSmith: Agentic Simulation Generation via Markov Decision Process Decomposition with Planner-Designer-Critic Refinement

FactorSmith is a novel framework that generates executable game simulations from natural language by combining factored POMDP decomposition to manage context complexity with a hierarchical planner-designer-critic agentic workflow for iterative code refinement, resulting in improved prompt alignment, reduced errors, and higher code quality.

Ali Shamsaddinlou, Morteza NourelahiAlamdari

Published 2026-03-24
📖 4 min read☕ Coffee break read

Imagine you want to build a complex video game just by describing it in plain English. You say, "Make a game where a bird flies through pipes, avoiding obstacles."

In the past, asking a super-smart AI (a Large Language Model or LLM) to do this was like asking a single, brilliant architect to design, build, and inspect an entire skyscraper in one breath. The AI would get overwhelmed, forget details, invent fake blueprints that don't exist, or accidentally knock down a wall it wasn't supposed to touch.

FactorSmith is a new framework that solves this by changing how the AI works. It combines two powerful ideas: breaking the job into tiny pieces and using a team of specialized workers to check the work.

Here is how it works, using a simple analogy:

1. The Problem: The Overwhelmed Architect

If you ask a single AI to write the code for a whole game at once, it gets confused. It's like trying to read a 1,000-page instruction manual while trying to write a new chapter. It starts hallucinating (making things up) or missing crucial steps.

2. The Solution: The "Factory Line" Approach

FactorSmith treats game creation like a high-tech assembly line with two main strategies:

Strategy A: The "Focus Lens" (Factored Decomposition)

Instead of showing the AI the entire game code at once, FactorSmith breaks the game down into tiny, manageable modules.

  • The Analogy: Imagine you are building a house. Instead of handing the carpenter the blueprints for the roof, the plumbing, the electrical wiring, and the landscaping all at once, you give them only the blueprint for the kitchen cabinets.
  • How it works: The system looks at the game and says, "Okay, for this specific step, we only need to worry about the 'gravity' variable and the 'ball' object." It hides everything else. This keeps the AI's "attention span" focused on exactly what it needs to do right now, preventing confusion.

Strategy B: The "Three-Person Review Team" (Planner-Designer-Critic)

Once the AI is focused on a tiny piece (like the kitchen cabinets), it doesn't just write the code and move on. It uses a team of three virtual agents to refine the work:

  1. The Designer (The Builder): This agent writes the code for the specific piece.
  2. The Critic (The Inspector): This agent looks at the code and gives it a score. It doesn't just say "good" or "bad"; it gives a structured report: "The logic is 8/10, but you forgot to handle the case where the ball hits the wall at a weird angle. That's a 4/10."
  3. The Planner (The Manager): This agent listens to the Critic.
    • If the score is high enough, the Manager says, "Great, let's move to the next room."
    • If the score is low, the Manager says, "Go back, Designer. Fix the wall angle issue."
    • The Safety Net: If the Designer tries to fix it but makes it worse, the Manager has a "Time Machine" (Checkpoint Rollback). It instantly reverts the code to the last good version, ensuring the project never gets worse than it started.

3. Why This is a Game Changer

The paper tested FactorSmith on making 8 different 2D games (like Flappy Bird and Snake).

  • Old Way (Single Shot): The AI tries to do it all at once. It often crashes or creates broken games.
  • Old Way (Just Breaking it down): FactorSim (the previous method) broke the game into pieces but didn't have the "Inspector" team. If the AI made a mistake in a small piece, it stayed there.
  • FactorSmith (The New Way): It breaks the game down and uses the Inspector team to fix mistakes before moving on.

The Result:

  • Fewer Crashes: The games actually run without breaking.
  • Better Alignment: The game looks exactly like what you asked for.
  • Smarter Code: It handles tricky situations (like a ball hitting a wall at a weird angle) much better.

The Bottom Line

FactorSmith is like upgrading from a solo artist trying to paint a masterpiece in one sitting, to a specialized construction crew.

  1. They break the house into rooms (Context Reduction).
  2. They build one room at a time.
  3. They have a strict inspector and a manager who ensure every room is perfect before they even think about the next one.

This allows us to generate complex, playable video games from simple text descriptions with a level of reliability we haven't seen before.