MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

The paper introduces MultiGen, a novel diffusion-based game engine that incorporates an explicit, persistent external memory to enable user-editable world structures and support coherent, real-time multiplayer interactions, overcoming the limitations of conventional next-frame prediction models.

Ryan Po, David Junhao Zhang, Amir Hertz, Gordon Wetzstein, Neal Wadhwa, Nataniel Ruiz

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you are playing a video game where the world is being painted in real-time, frame by frame, by a magical artist (an AI). In most current "AI game engines," this artist works like a painter with a very short memory. They look at the last few brushstrokes they made and guess what the next one should be.

The Problem: If you play for a long time, the artist forgets the big picture. They might draw a door that leads to a wall, or suddenly change the shape of a room because they "forgot" how it started. If two players are in the same game, they might see completely different worlds because the artist is painting two separate, unconnected stories.

The Solution: MultiGen
The paper "MultiGen" introduces a new way to build these AI games. Instead of just relying on the artist's short-term memory, they give the system a permanent notebook (called "External Memory").

Here is how it works, broken down into simple concepts:

1. The Blueprint (The Notebook)

Think of the game world not as a complex 3D city, but as a simple 2D floor plan drawn on a piece of paper. This is the "External Memory."

  • For the Designer: Before the game starts, you can draw this floor plan. You can draw walls, corridors, and rooms. You can even erase a wall and move it while the game is running.
  • For the AI: The AI doesn't have to guess where the walls are. It just looks at the notebook. "Okay, the notebook says there's a wall here, so I will paint a wall here." This means the world stays consistent, even after hours of playing.

2. The Three-Part Team

Instead of one giant AI trying to do everything, MultiGen splits the job into three specialized workers:

  • The Architect (Memory Module): This worker holds the notebook. They know exactly where the walls are and where every player is standing. They don't care what the room looks like (the colors, the lighting); they only care about the structure.
  • The Painter (Observation Module): This is the artist who draws the actual game screen you see. They look at the Architect's notebook ("There is a wall to the left") and the last few frames of the game, then they paint the next frame. Because they have the notebook, they never forget where the walls are.
  • The Runner (Dynamics Module): This worker updates the Architect's notebook. If you press "Forward," the Runner tells the Architect, "The player moved 5 steps forward." The Architect updates the notebook, and the Painter draws the new view.

3. The Multiplayer Magic

This is where it gets really cool. Imagine two people playing the same game, but looking at it from different angles.

  • Without MultiGen: The AI tries to paint two separate stories. Player A might see a door; Player B might see a solid wall because the AI got confused about where the door was.
  • With MultiGen: Both players look at the same notebook.
    • Player A shoots Player B.
    • The "Runner" updates the notebook to say, "Player B is now dead."
    • The "Painter" for Player A sees the dead body.
    • The "Painter" for Player B (who is now dead) sees the world fade out.
    • Crucially, if Player A walks around a corner, Player B (if they are watching a replay or a third person) will see Player A appear exactly where the notebook says they are. They share the same reality.

The Analogy: The Stage Play vs. The Improv Show

  • Old AI Games (Improv Show): The actors (AI) are making it up as they go. They remember the last few lines, but if the play goes on for an hour, they might forget the plot, walk off the stage, or argue about where the set pieces are.
  • MultiGen (Stage Play with a Script): There is a director (the Notebook) holding the script and the stage map. The actors (the AI) still improvise the details (the lighting, the textures, the specific expressions), but they must follow the script. If the script says "The door is on the left," the door is on the left. If two actors are on stage, they both know exactly where the other one is because they are reading from the same script.

Why This Matters

  1. You are the Director: You can draw a level on a napkin (a simple map), and the AI will build a playable 3D world based on it. You can edit the map mid-game, and the world will instantly adjust.
  2. No More Glitchy Worlds: The game won't hallucinate new walls or disappear corridors because the "notebook" keeps the truth safe.
  3. True Multiplayer: Multiple players can interact in a shared, consistent world, just like in traditional video games, but generated by AI in real-time.

In short, MultiGen gives AI games a "long-term memory" and a "shared reality," turning chaotic, forgetful generators into reliable, editable, and multiplayer-ready game engines.