RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion

The paper proposes RDFC-GAN, a novel two-branch end-to-end fusion network that combines a Manhattan world-guided encoder-decoder with an RGB-depth fusion CycleGAN to effectively complete large missing regions in indoor depth maps by leveraging RGB imagery and pseudo-depth supervision.

Haowen Wang, Zhengping Che, Yufan Yang, Mingyuan Wang, Zhiyuan Xu, Xiuquan Qiao, Mengshi Qi, Feifei Feng, Jian Tang

Published 2026-02-24
📖 5 min read🧠 Deep dive

The Big Problem: The "Ghostly" Room

Imagine you are trying to build a 3D model of your living room using a special camera (like a Kinect or a robot's eye). You expect to see the walls, the sofa, and the coffee table.

But instead, the camera gives you a map full of holes.

  • Glass windows? The camera sees right through them, leaving a blank spot.
  • Shiny mirrors or black velvet? The light bounces away or gets absorbed, so the camera thinks there is nothing there.
  • Far corners? The signal gets too weak to measure.

The result is a "depth map" (a picture of how far away things are) that looks like Swiss cheese. This is a nightmare for robots trying to navigate or for augmented reality apps trying to place a virtual chair in your room. They don't know where the floor ends and the wall begins.

The Solution: The "Two-Chef" Kitchen

The authors of this paper built a new AI system called RDFC-GAN to fix these holes. Think of it as a kitchen with two expert chefs working together to cook the perfect meal (the complete depth map).

Chef 1: The "Architect" (The MCN Branch)

  • Who they are: This chef is a stickler for rules and geometry. They know that most houses are built with straight lines, right angles, and flat surfaces (this is called the Manhattan World Assumption—like a city grid).
  • What they do: They look at the raw, holey data and say, "Okay, this wall must be vertical, and this floor must be flat." They use the RGB image (the color photo) to guess the orientation of the walls.
  • The Result: They produce a depth map that is structurally correct and smooth. It knows where the walls should be, but it might look a bit blurry or lack fine details (like the texture of a brick wall).

Chef 2: The "Artist" (The RDFC-GAN Branch)

  • Who they are: This chef is a creative genius who loves texture and detail. They are trained using a special technique called a CycleGAN (a type of AI that learns to translate one style of image into another).
  • What they do: They look at the color photo and say, "If I see a wooden door here, the depth map should look like wood, not just a flat gray blob." They try to "paint" the missing depth values by mimicking the textures in the color photo.
  • The Result: They produce a depth map that is rich in detail and looks realistic, but sometimes they might get a little carried away and add "noise" or make things look a bit wobbly.

The "Taste Tester" (The Fusion Head)

Now, you have two dishes: one is structurally perfect but bland, and the other is flavorful but messy. You need a Taste Tester to combine them.

  • The system uses a special module called W-AdaIN (Weighted Adaptive Instance Normalization) to mix the two chefs' outputs.
  • It acts like a smart editor: "In this area, the Architect is right (it's a flat wall), so I'll use their version. In this area, the Artist is right (it's a complex chair), so I'll use their version."
  • The result is a final depth map that is both structurally sound and full of realistic details.

The Secret Ingredient: "Fake" Training Data

One of the biggest hurdles in training these AI chefs is that you can't just show them a "perfect" room and a "holey" room to learn from. Real holey rooms are messy in unpredictable ways.

The authors invented a way to create "Pseudo Depth Maps" (fake holey maps) for training:

  1. The "Highlight" Trick: They look for shiny spots in the color photo and pretend the depth sensor failed there (because shiny things confuse sensors).
  2. The "Dark" Trick: They look for black areas and pretend the sensor failed there (because dark things absorb light).
  3. The "Glass" Trick: They use AI to find windows and mirrors in the photo and erase the depth data there.

By training the chefs on these simulated disasters, the AI learns exactly how to fix the real-world problems it will face later.

Why This Matters

Previous methods tried to fix these holes by just "guessing" based on nearby pixels, which often resulted in blurry, smeared images.

RDFC-GAN is special because:

  1. It respects the rules of architecture (straight walls, flat floors).
  2. It respects the art of texture (wood grain, fabric, glass).
  3. It trains on realistic "disasters" rather than random holes.

The Bottom Line

Imagine trying to finish a jigsaw puzzle where half the pieces are missing.

  • Old methods tried to fill the gaps with a blurry marker.
  • RDFC-GAN brings in an Architect to draw the straight lines, an Artist to paint the details, and a Smart Editor to glue them together perfectly.

The result? A robot can finally "see" the room clearly, avoiding glass doors and navigating around furniture without crashing. This makes indoor navigation, robot vacuuming, and augmented reality much safer and more accurate.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →