ReSpace: Text-Driven Autoregressive 3D Indoor Scene Synthesis and Editing

ReSpace is a generative framework that leverages autoregressive next-token prediction and supervised fine-tuning with preference alignment to enable text-driven 3D indoor scene synthesis and editing with explicit room boundaries and superior spatial reasoning, outperforming state-of-the-art methods in object manipulation and human-perceived quality.

Martin JJ. Bucher, Iro Armeni

Published 2026-03-24
📖 5 min read🧠 Deep dive

Imagine you are an interior designer, but instead of sketching on paper or dragging and dropping furniture in a video game, you simply talk to your room.

This paper introduces ReSpace, a new AI system that lets you design and edit 3D indoor rooms using plain English. It's like having a magical assistant that understands not just what you want (e.g., "a cozy sofa"), but exactly where it should go, how big it should be, and how it fits with everything else in the room.

Here is a breakdown of how it works, using some everyday analogies:

1. The Problem: The "Blind" and the "Rigid"

Before ReSpace, existing AI tools for designing rooms had two main flaws:

  • The "Blind" Architect: Some AIs could generate rooms, but they were like architects who couldn't see the walls. They would put a sofa floating in mid-air or a table sticking out of the ceiling because they didn't truly understand the room's boundaries.
  • The "Rigid" Builder: Other tools were like builders who could only work with perfect square boxes. If your room had a weird shape (like an L-shape or a bay window), they couldn't handle it. Also, if you wanted to swap a chair for a table, they often couldn't do it; they usually had to start the whole room from scratch.

2. The Solution: ReSpace (The "Smart Translator")

ReSpace acts as a translator between your natural language and the 3D world. It breaks the process down into three clever steps:

Step A: The "Blueprint" (Structured Scene Representation)

Instead of trying to guess the room from scratch, ReSpace uses a special digital "blueprint" (called SSR). Think of this like a JSON recipe card.

  • It lists the room's exact shape (the walls and floor).
  • It lists every object currently in the room with a description (e.g., "a modern gray sofa"), its size, and its exact coordinates.
  • Why this matters: Because the room is written as a clear list of facts, the AI can easily read it, understand the boundaries, and know exactly where there is empty space.

Step B: The "Next Word" Game (Autoregressive Generation)

ReSpace treats designing a room like playing a game of fill-in-the-blanks.

  • You give it a prompt: "Add a dark gray tufted sofa."
  • The AI looks at the current "recipe" (the room), reads the prompt, and predicts the next few words needed to describe the new sofa's position and size.
  • It's like a text adventure game where the computer writes the next sentence of the story for you. It does this one object at a time, building the scene piece by piece.

Step C: The "Furniture Catalog" (Asset Sampling)

Once the AI decides where the sofa goes and how big it is, it needs a real 3D model.

  • ReSpace doesn't just pick the first sofa it finds. It acts like a personal shopper. It looks through a massive catalog of 3D furniture and picks the one that best matches your description ("dark gray," "tufted") and fits the size constraints it just calculated.
  • If you say "remove the plant," the AI simply deletes that item from the recipe card.

3. The Secret Sauce: "Voxel" Checking

One of the biggest innovations in this paper is how ReSpace checks for mistakes.

  • Old Way: Imagine checking if a chair fits under a table by looking at a rough cardboard box around the chair. If the box fits, the AI thinks it's good. But in reality, the chair might still be hitting the table leg.
  • ReSpace's Way (Voxelization): ReSpace turns the room into a giant 3D grid of tiny cubes (like a Minecraft world). It checks every single tiny cube to see if the chair is actually touching the table or the floor.
  • This allows it to catch "fine-grained" errors, like a lamp sticking out of the wall or a chair slightly overlapping a rug, ensuring the final scene looks physically realistic.

4. Learning from Feedback (The "Taste Test")

The team didn't just teach the AI to follow rules; they taught it to have good taste.

  • They used a technique called Preference Alignment. Imagine showing the AI two different ways to place a lamp. One looks weird (too close to the wall), and one looks great.
  • They told the AI, "The second one is better."
  • Over time, the AI learned to prioritize arrangements that humans find pleasing, not just arrangements that technically fit in the box.

Summary: What Can You Do With It?

With ReSpace, you can:

  • Add: "Put a modern spherical lamp in the corner."
  • Remove: "Take away the plant with the black pot."
  • Swap: "Replace the bookcase with a wooden wardrobe."
  • Handle Complex Shapes: It works perfectly in rooms with weird angles, not just perfect squares.

In a nutshell: ReSpace is like having a super-smart, patient interior designer who listens to your voice commands, understands the physical limits of your room, checks for collisions with microscopic precision, and arranges your furniture to look beautiful—all without you ever having to drag a mouse or move a 3D model yourself.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →