Code World Models for Parameter Control in Evolutionary Algorithms

This paper demonstrates that Code World Models, where Large Language Models synthesize Python simulators from suboptimal optimizer trajectories, can effectively learn and control evolutionary algorithm parameters to achieve near-optimal or superior performance on challenging combinatorial optimization problems compared to existing baselines.

Camilo Chacón Sartori, Guillem Rodríguez Corominas

Published 2026-02-27
📖 5 min read🧠 Deep dive

Imagine you are trying to teach a robot how to solve a maze. Usually, you have two options:

  1. The Rulebook: You give the robot a strict manual on how to turn left or right based on the wall colors.
  2. The Trial-and-Error: You let the robot run into walls thousands of times, hoping it eventually learns the pattern by accident.

This paper introduces a third, smarter option: The "Dream Simulator."

The researchers asked a question: Can we teach an AI (specifically a Large Language Model, or LLM) to watch a robot fail at solving a problem, understand why it failed, and then write its own "dream simulator" to figure out the perfect strategy?

Here is the breakdown of their method, "Code World Models" (CWM), using simple analogies.

1. The Setup: The Robot and the Maze

The "robot" is a standard evolutionary algorithm (a type of AI that evolves solutions). Its job is to find the best answer to a math problem.

  • The Problem: The robot has a "knob" it can turn (called parameter kk). This knob controls how wildly it changes its solution.
    • Turn it low: It makes tiny, safe adjustments. Good for fine-tuning.
    • Turn it high: It makes huge, chaotic jumps. Good for escaping dead ends, but risky.
  • The Challenge: The robot doesn't know when to turn the knob up or down. If it turns it the wrong way, it gets stuck in a "deceptive valley"—a trap that looks like the top of a hill but is actually a pit.

2. The Old Way vs. The New Way

  • The Old Way (Adaptive Rules): Traditional methods use simple rules like, "If you don't improve, turn the knob down." This works great on smooth hills but fails miserably in deceptive valleys because the robot keeps turning the knob down until it's stuck forever.
  • The New Way (Code World Models):
    1. Watch: The researchers let the robot run with random settings, collecting a bunch of "failed" or "sub-optimal" attempts.
    2. Ask the Oracle: They feed these failed attempts to a super-smart AI (the LLM) and say, "Look at these mistakes. Can you write a Python program that predicts what will happen if we change the knob?"
    3. The Magic: The LLM doesn't just guess; it writes a simulator. It creates a piece of code that acts like a crystal ball.
    4. The Plan: Before the robot makes a move in the real world, it asks the simulator: "If I turn the knob to 5, what happens? If I turn it to 10, what happens?" The simulator answers instantly. The robot then picks the best move.

3. The Results: Beating the Traps

The researchers tested this on four different types of "mazes":

  • The Smooth Hills (LeadingOnes & OneMax): These are easy problems. The new method learned the perfect strategy just by watching the failures. It performed almost as well as the theoretical "perfect" strategy, even though it never saw the perfect strategy during training.
  • The Deceptive Valley (Jumpk): This is the big win. In this maze, the robot gets stuck in a pit. Traditional rules say, "You're stuck, so be more careful (turn knob down)." This leads to a 0% success rate.
    • The CWM Solution: The simulator realized, "Hey, being careful won't work here. We need to jump hard to get out!" It figured out the exact jump size needed to escape the trap.
    • Result: While other methods failed 100% of the time, this method succeeded 100% of the time.
  • The Rugged Mountain (NK-Landscape): This is a messy, chaotic maze with no clear rules. The LLM couldn't use math formulas here. Instead, it looked at a table of "what happened when we tried X." It learned the pattern purely from data and still beat all other methods.

4. Why This is a Big Deal

  • It's Cheaper: The new method learned from 200 "offline" attempts. A competing AI (DQN) needed 500 "online" attempts (where the AI actually has to run the simulation, which is slow and expensive) and still failed more often.
  • It's Transparent: Instead of a "black box" neural network where you don't know why it made a decision, the LLM wrote actual Python code. You can read the code and see exactly how it decided to turn the knob. It's like the AI wrote its own instruction manual.
  • It Generalizes: The robot learned the strategy for a specific maze size, and when they made the maze bigger or changed the rules slightly, the robot still knew what to do. It didn't just memorize; it understood the logic.

The Bottom Line

This paper shows that we don't need to hand-code every rule for AI. Instead, we can show an AI some examples of things going wrong, ask it to write a simulator to understand the physics of the problem, and then let that simulator guide the AI to success.

It's like giving a chess player a book of their own past losses, asking them to write a new rulebook based on those losses, and then having them play a perfect game using that new rulebook. The result? They beat the experts.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →