DiffTrans: Differentiable Geometry-Materials Decomposition for Reconstructing Transparent Objects

This paper presents DiffTrans, a differentiable rendering framework that utilizes FlexiCubes for initial geometry and a recursive CUDA-based ray tracer to jointly optimize geometry, refractive index, and absorption, enabling high-quality reconstruction of transparent objects with diverse topologies and complex textures in intricate scenes.

Changpu Li, Shuang Wu, Songlin Tang, Guangming Lu, Jun Yu, Wenjie Pei

Published 2026-03-03
📖 5 min read🧠 Deep dive

Imagine you are trying to build a perfect 3D model of a glass wine glass sitting on a table, but you only have a pile of 2D photos taken from different angles.

This is a nightmare for computers. Why? Because glass is tricky. It doesn't just sit there; it bends light (refraction), it reflects the room around it, and it might have colored swirls inside it (absorption). If you try to guess the shape of the glass just by looking at the photos, the computer gets confused. Is that dark spot a shadow? Is it a crack in the glass? Is it a reflection of a tree outside?

The paper "DiffTrans" introduces a new AI system designed to solve this puzzle. Here is how it works, explained with some everyday analogies.

The Problem: The "Ghost in the Machine"

Most previous methods for 3D reconstruction are like trying to sculpt a statue while wearing thick foggy goggles. They are great at handling solid objects (like a wooden chair) or simple glass (like a clear window), but they fail miserably with complex transparent objects (like a jeweled goblet or a resin figurine with internal colors). They can't figure out the shape and the material at the same time.

The Solution: DiffTrans (The "Smart Sculptor")

The authors created a system called DiffTrans. Think of it as a master sculptor who doesn't just look at the object, but also understands how light behaves like a bouncy ball.

The process happens in three main stages:

1. The Rough Sketch (Geometry Initialization)

First, the AI looks at the "silhouette" of the object in the photos (the black-and-white outline).

  • The Analogy: Imagine you have a block of clay and you want to carve a horse. You start by roughly chopping away the clay to get the general shape of the horse's body.
  • What DiffTrans does: It uses a technique called FlexiCubes to quickly carve out a rough 3D shape based on the outlines. To make sure the clay doesn't have weird holes or cracks, it uses a "dilation" trick (like inflating a balloon slightly) to fill in the gaps, ensuring a smooth, solid starting point.

2. Mapping the Room (Environment Recovery)

Before the AI can understand the glass, it needs to know what the glass is reflecting.

  • The Analogy: If you are holding a shiny spoon, the image you see in the spoon depends entirely on the room behind you. If the room changes, the spoon's reflection changes.
  • What DiffTrans does: It looks at the parts of the photo outside the glass object to build a 3D map of the room (the lighting and background). This is crucial because the glass acts like a mirror and a window combined; the AI needs to know what's being reflected to figure out the shape of the glass.

3. The "Magic Loop" (Recursive Ray Tracing)

This is the secret sauce. Once the AI has a rough shape and a map of the room, it starts a "guess-and-check" loop that is incredibly smart.

  • The Analogy: Imagine you are blindfolded and holding a laser pointer. You shoot a beam of light at the glass.
    1. The beam hits the glass and bounces (reflection).
    2. The beam goes through the glass, bending as it enters (refraction).
    3. Inside the glass, the beam gets slightly dimmer if the glass is colored (absorption).
    4. The beam hits the back wall, bounces back, and hits the glass again.
  • What DiffTrans does: It simulates this entire journey of light inside the computer using a Differentiable Ray Tracer.
    • It shoots virtual light rays through the 3D model.
    • It compares the result to the actual photo.
    • If the photo looks different, the AI tweaks the shape of the glass, the "bendiness" of the glass (Index of Refraction), and the "darkness" of the glass (Absorption).
    • It does this over and over, millions of times, until the virtual light rays match the real photos perfectly.

Why is this a big deal?

  • It's Fast: Usually, simulating light bouncing inside glass takes hours or days. The authors built this system using CUDA (a language for graphics cards), making it run incredibly fast—like switching from a slow horse to a sports car.
  • It Handles Complexity: Previous methods would get confused by a glass object with a complex internal pattern (like a stained-glass window or a gemstone). DiffTrans can separate the "shape" from the "internal color," allowing it to reconstruct intricate objects accurately.
  • It's Editable: Because the AI understands the materials, you can take the finished 3D model and change the lighting. You can make the glass look like it's in a sunny park or a dark cave, and it will look realistic because the AI actually knows how the light interacts with that specific glass.

The Bottom Line

DiffTrans is like giving a computer a pair of X-ray glasses and a physics textbook. It allows the computer to look at a set of photos of a transparent object and say, "Ah, I see. That's a curved surface, it bends light by this much, and it has a red swirl inside." It then builds a perfect 3D replica that you can rotate, light up, and even use in video games or movies.