One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image

One2Scene is a novel framework that generates geometrically consistent, explorable 3D scenes from a single image by decomposing the task into panorama generation, 3D scaffold construction via multi-view stereo matching on sparse anchor views, and novel view synthesis, thereby overcoming the severe distortions and artifacts common in existing methods during large camera motions.

Pengfei Wang, Liyi Chen, Zhiyuan Ma, Yanjun Guo, Guowen Zhang, Lei Zhang

Published 2026-03-02
📖 4 min read☕ Coffee break read

Imagine you have a single photograph of a room. You want to turn that flat picture into a fully explorable 3D world where you can walk around, look behind the sofa, and peek out the window.

This is a notoriously difficult problem for computers. If you just try to "guess" what's behind the camera, the computer often gets confused. It might stretch the walls like taffy, make the floor disappear, or create a hallway that leads nowhere. This is what happens with current technology: it's like trying to build a house by guessing the blueprint from a single photo of the front door.

The paper "One2Scene" introduces a new method that solves this by breaking the impossible task into three manageable steps, using a clever "construction crew" approach.

Here is how it works, explained with simple analogies:

Step 1: The "360-Degree Panoramic Map" (The Panorama Generator)

The Problem: A single photo only shows you what's in front of you. It's like looking through a keyhole.
The Solution: The system first uses an AI artist to imagine the rest of the room. It takes your single photo and expands it into a 360-degree panoramic map (like a Google Street View sphere).

  • Analogy: Imagine you are standing in the middle of a room. You can only see the wall in front of you. One2Scene acts like a magical painter who instantly paints the other three walls, the ceiling, and the floor around you, giving you a complete "bubble" of the world.

Step 2: The "Scaffolding Crew" (The 3D Geometric Scaffold)

The Problem: Even though the system now has a 360-degree picture, it's still just a flat painting wrapped around a sphere. It doesn't know how far away the walls are. If you tried to walk through it, you might walk right into a painted wall.
The Solution: This is the core innovation. The system takes that flat 360-degree map and cuts it into six square "anchor views" (like the six faces of a dice). It then uses a special "feed-forward" network (a fast, one-shot calculator) to turn these flat squares into a 3D geometric scaffold.

  • Analogy: Think of building a house. Before you put up the pretty wallpaper or paint, you need a sturdy wooden frame (scaffolding).
    • Old methods tried to guess the shape of the house while painting the walls, often resulting in crooked, wobbly structures.
    • One2Scene builds a perfect, rigid wooden frame first. It calculates exactly where every wall, floor, and ceiling is in 3D space. It does this by treating the six "dice faces" like a puzzle, using math to figure out the depth and distance between them.

Step 3: The "Interior Designer" (The Novel View Synthesis)

The Problem: Now you have a sturdy 3D frame, but it's empty. You need to fill it with realistic details (textures, lighting, objects) that look good from any angle you choose to walk to.
The Solution: The system uses the 3D scaffold as a "guide rail." It tells the AI generator: "Hey, we know exactly where the wall is. Now, please paint a realistic image of that wall from this new angle."

  • Analogy: Imagine you are a director filming a movie. You have a rigid set (the scaffold) built by the construction crew. Now, the camera can move anywhere—up, down, left, right—because the set is solid. The "Interior Designer" AI just needs to paint the walls and furniture based on where the camera is pointing. Because the "scaffolding" is already there, the AI never gets confused about where the floor is or how big the room is.

Why is this a big deal?

  1. No More "Wobbly" Worlds: Previous methods often created "hallucinations" where objects stretched or disappeared when you moved the camera. Because One2Scene builds a solid 3D skeleton first, the world stays stable and consistent, no matter how far you walk.
  2. Speed: It doesn't need to spend hours optimizing every single scene. It builds the scaffold in 0.5 seconds.
  3. Exploration: You can actually "walk" through the generated scene. If you turn around 180 degrees, the system knows exactly what should be there because the 3D scaffold told it.

Summary

Think of One2Scene as a three-step construction project:

  1. The Artist: Draws a complete 360-degree picture of the world.
  2. The Engineer: Builds a perfect, invisible 3D wireframe (scaffold) inside that picture so the geometry is correct.
  3. The Painter: Uses that wireframe to paint realistic, high-quality views from any angle you want.

By separating the "geometry" (the shape) from the "appearance" (the look), One2Scene creates 3D worlds that are not only beautiful but also physically consistent, allowing for true, immersive exploration from a single photo.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →