Imagine a City: CityGenAgent for Procedural 3D City Generation

This paper introduces CityGenAgent, a natural language-driven framework that utilizes a two-stage learning strategy of Supervised Fine-Tuning and Reinforcement Learning to hierarchically generate high-quality, editable, and semantically aligned 3D cities through interpretable Block and Building programs.

Zishan Liu, Zecong Tang, RuoCheng Wu, Xinzhe Zheng, Jingyu Hu, Ka-Hei Hui, Haoran Xie, Bo Dai, Zhengzhe Liu

Published 2026-03-02
📖 5 min read🧠 Deep dive

Imagine you want to build a massive, realistic 3D city for a video game, a self-driving car simulator, or a virtual reality world. In the past, doing this was like trying to build a city out of LEGOs by hand, one brick at a time. It took forever, required a team of experts, and if you wanted to change the color of a building or move a park, you had to tear it down and start over.

Other modern AI methods are like hiring a magic painter. You tell them, "Paint a city," and they spray a beautiful picture. But here's the catch: it's just a flat painting. If you try to walk into it, you hit a wall. You can't move the buildings, and the roads don't actually connect.

"CityGenAgent" is a new invention that solves this by acting like a super-smart City Planner and Architect who speaks your language. Instead of just painting a picture or building by hand, it writes a recipe (a computer program) that tells a robot exactly how to build the city, piece by piece.

Here is how it works, broken down into simple steps:

1. The Two-Step Recipe (The "Programs")

Instead of trying to describe the whole city in one giant paragraph, CityGenAgent breaks the job into two distinct recipes:

  • The "Block Program" (The City Planner):
    Imagine you are drawing a map on a piece of paper. You say, "I want a big park here, a tall office building there, and a school over there."

    • The AI takes your words and draws a precise map. It makes sure the buildings don't overlap (like two cars trying to park in the same spot) and that the roads make sense.
    • Analogy: This is like the City Zoning Department. They decide where things go and how big the lots are.
  • The "Building Program" (The Architect):
    Once the map is set, the AI looks at each building spot and asks, "What does this building look like?" You might say, "Make it a modern glass skyscraper with blue windows."

    • The AI writes a detailed list of instructions for that specific building: "Use glass for the walls, blue for the windows, and a flat roof."
    • Analogy: This is like the Interior Designer and Architect. They decide the style, color, and materials of the individual houses.

2. Learning to Be Perfect (The Training)

The AI didn't start out perfect. It learned in two stages, kind of like a student:

  • Stage 1: The Classroom (Supervised Fine-Tuning):
    The AI was shown thousands of examples of "City Description" paired with the "Correct Map/Blueprint." It learned the rules: "If I say 'park', I must draw green space," and "Buildings cannot float in the air." This taught it the basics of grammar and geometry.

  • Stage 2: The Coach (Reinforcement Learning):
    Just knowing the rules isn't enough; the city needs to look right and feel right. The AI started generating cities, and a "Coach" (a smart computer judge) gave it scores.

    • The "No-Collision" Reward: If the AI tried to put two buildings in the same spot, the Coach gave it a bad score.
    • The "Looks Like the Prompt" Reward: If you asked for a "red brick house" and it made a "glass tower," the Coach gave it a bad score.
    • The AI kept trying until it got perfect scores, learning to "think" about space and style better than any human could.

3. The Magic of "Talking to Change"

This is the coolest part. Because the AI built the city using a recipe (the program) rather than just a static image, you can talk to it to make changes instantly.

  • Old Way: You want to change a building from "Modern" to "Chinese Style." You have to delete the old 3D model and build a new one from scratch.
  • CityGenAgent Way: You just say, "Hey, change that building to Chinese style." The AI looks at its recipe, finds the line that says "Modern Glass," and swaps it for "Chinese Tile Roof." It instantly updates the 3D model without breaking anything else.

Why is this a big deal?

  • It's Editable: You aren't stuck with what you see. You can tweak the city like a text document.
  • It's Real: It creates actual 3D shapes (meshes) that cars can drive on and robots can walk through, not just pretty pictures.
  • It's Fast: It can generate a whole city block in less than a minute, whereas a human expert might take an hour or more.

In summary: CityGenAgent is like having a magical construction crew that listens to your ideas, draws a perfect blueprint, builds the city, and then stands by ready to rearrange the furniture or repaint the walls the moment you ask. It turns the complex, messy job of building a city into a simple conversation.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →