DiffTrans: Differentiable Geometry-Materials Decomposition for Reconstructing Transparent Objects

Imagine you are trying to build a perfect 3D model of a glass wine glass sitting on a table, but you only have a pile of 2D photos taken from different angles.

This is a nightmare for computers. Why? Because glass is tricky. It doesn't just sit there; it bends light (refraction), it reflects the room around it, and it might have colored swirls inside it (absorption). If you try to guess the shape of the glass just by looking at the photos, the computer gets confused. Is that dark spot a shadow? Is it a crack in the glass? Is it a reflection of a tree outside?

The paper "DiffTrans" introduces a new AI system designed to solve this puzzle. Here is how it works, explained with some everyday analogies.

The Problem: The "Ghost in the Machine"

Most previous methods for 3D reconstruction are like trying to sculpt a statue while wearing thick foggy goggles. They are great at handling solid objects (like a wooden chair) or simple glass (like a clear window), but they fail miserably with complex transparent objects (like a jeweled goblet or a resin figurine with internal colors). They can't figure out the shape and the material at the same time.

The Solution: DiffTrans (The "Smart Sculptor")

The authors created a system called DiffTrans. Think of it as a master sculptor who doesn't just look at the object, but also understands how light behaves like a bouncy ball.

The process happens in three main stages:

1. The Rough Sketch (Geometry Initialization)

First, the AI looks at the "silhouette" of the object in the photos (the black-and-white outline).

The Analogy: Imagine you have a block of clay and you want to carve a horse. You start by roughly chopping away the clay to get the general shape of the horse's body.
What DiffTrans does: It uses a technique called FlexiCubes to quickly carve out a rough 3D shape based on the outlines. To make sure the clay doesn't have weird holes or cracks, it uses a "dilation" trick (like inflating a balloon slightly) to fill in the gaps, ensuring a smooth, solid starting point.

2. Mapping the Room (Environment Recovery)

Before the AI can understand the glass, it needs to know what the glass is reflecting.

The Analogy: If you are holding a shiny spoon, the image you see in the spoon depends entirely on the room behind you. If the room changes, the spoon's reflection changes.
What DiffTrans does: It looks at the parts of the photo outside the glass object to build a 3D map of the room (the lighting and background). This is crucial because the glass acts like a mirror and a window combined; the AI needs to know what's being reflected to figure out the shape of the glass.

3. The "Magic Loop" (Recursive Ray Tracing)

This is the secret sauce. Once the AI has a rough shape and a map of the room, it starts a "guess-and-check" loop that is incredibly smart.

The Analogy: Imagine you are blindfolded and holding a laser pointer. You shoot a beam of light at the glass.
1. The beam hits the glass and bounces (reflection).
2. The beam goes through the glass, bending as it enters (refraction).
3. Inside the glass, the beam gets slightly dimmer if the glass is colored (absorption).
4. The beam hits the back wall, bounces back, and hits the glass again.
What DiffTrans does: It simulates this entire journey of light inside the computer using a Differentiable Ray Tracer.
- It shoots virtual light rays through the 3D model.
- It compares the result to the actual photo.
- If the photo looks different, the AI tweaks the shape of the glass, the "bendiness" of the glass (Index of Refraction), and the "darkness" of the glass (Absorption).
- It does this over and over, millions of times, until the virtual light rays match the real photos perfectly.

Why is this a big deal?

It's Fast: Usually, simulating light bouncing inside glass takes hours or days. The authors built this system using CUDA (a language for graphics cards), making it run incredibly fast—like switching from a slow horse to a sports car.
It Handles Complexity: Previous methods would get confused by a glass object with a complex internal pattern (like a stained-glass window or a gemstone). DiffTrans can separate the "shape" from the "internal color," allowing it to reconstruct intricate objects accurately.
It's Editable: Because the AI understands the materials, you can take the finished 3D model and change the lighting. You can make the glass look like it's in a sunny park or a dark cave, and it will look realistic because the AI actually knows how the light interacts with that specific glass.

The Bottom Line

DiffTrans is like giving a computer a pair of X-ray glasses and a physics textbook. It allows the computer to look at a set of photos of a transparent object and say, "Ah, I see. That's a curved surface, it bends light by this much, and it has a red swirl inside." It then builds a perfect 3D replica that you can rotate, light up, and even use in video games or movies.

1. Problem Statement

Reconstructing the geometry and materials of transparent objects from multi-view images is a highly ill-posed problem. Unlike opaque objects, the appearance of transparent objects is heavily influenced by light refraction, reflection, and internal absorption, making the relationship between scene parameters and observed pixels complex and non-linear.

Key Challenges:

Complex Light Propagation: Light rays bend (refract) and reflect multiple times within the object, and the background is distorted.
Limitations of Existing Methods:
- Eikonal-based methods: Struggle to extract reliable meshes due to a lack of surface geometry constraints.
- Surface-based methods (NeRF, SDF, Gaussian Splatting): Often fail to reconstruct objects with complex internal textures or absorption.
- Current State-of-the-Art: Most methods either ignore internal absorption (modeling only surface reflection) or assume ideal transparency, failing to handle real-world objects like jewels, glass decorations, or resin with complex internal structures.

2. Methodology: DiffTrans

The authors propose DiffTrans, a differentiable rendering framework that decomposes and reconstructs both the geometry and materials (Index of Refraction - IoR, and Absorption Rate) of transparent objects in an end-to-end manner. The framework operates in three progressive stages:

A. Geometry and Environment Initialization

Initial Geometry: Instead of relying on complex implicit fields, the method uses FlexiCubes (a differentiable iso-surface representation) to reconstruct an initial coarse mesh.
- Supervision: It is supervised solely by multi-view object silhouettes (masks).
- Regularization: To prevent artifacts and cracks common in mask-only supervision, the authors introduce dilation regularization (to fill gaps) and smoothness regularization (penalizing gradients in depth and normals).
Environment Recovery: Simultaneously, the environment light radiance field is recovered using pixels outside the object mask. This is modeled using a radiance field with a coarse dense grid and fine tri-planes (inspired by MERF), initialized via a NeRF-like approach.

B. Light Interaction Modeling

The framework models light interaction based on three key assumptions to simplify the physics while maintaining realism:

Consistent IoR: The refractive index is uniform within the object, allowing linear ray propagation inside (eliminating the need for eikonal equations).
Material Composition: Materials consist only of Absorption Rate and Index of Refraction.
Specular Surfaces: Surfaces are assumed to be perfectly smooth (specular), ignoring roughness to avoid excessive complexity.

The light transport is modeled via:

Fresnel Equations: Calculating reflection ( $R$ ) and refraction ( $T$ ) rates at surface intersections.
Beer-Lambert Law: Modeling radiance decay ( $L$ ) as light travels through the absorptive medium: $L(x) = L(x_0) \exp(-\int \mu_t dx)$ , where $\mu_t$ is the absorption rate.

C. Differentiable Recursive Mesh Ray Tracer

The core innovation is a recursive differentiable mesh ray tracer implemented in OptiX and CUDA.

Recursive Tracing: Rays are traced recursively until they reach a maximum depth ( $D_{max}$ ) or exit the scene.
Hybrid Sampling:
- Exterior: Rays sample the environment radiance field.
- Interior: Rays sample the absorption rate field (represented as a 3D texture).
Differentiability: The intersection between rays and the mesh is differentiable. Gradients are backpropagated through the ray tracing process to jointly optimize:
1. Mesh Vertices (Geometry).
2. Index of Refraction (IoR).
3. Absorption Rate ( $\mu_t$ ).
Optimization Loss: The system minimizes a weighted sum of:
- Color Loss ( $L_{color}$ ): L2 loss between rendered and ground truth colors (with a weighting factor to handle over-absorbed pixels).
- Tone Regularization ( $L_{tone}$ ): Constrains the color channel ratios to prevent incorrect gradients caused by varying background darkness.
- Material Smoothness ( $L_{mat-smooth}$ ): Ensures spatial consistency in the absorption texture.

3. Key Contributions

Novel Framework (DiffTrans): A unified, end-to-end differentiable framework capable of decomposing geometry and materials (IoR and absorption) for transparent objects with diverse topologies and complex internal textures.
Robust Initialization: A strategy using FlexiCubes with dilation and smoothness regularization to recover initial geometry from silhouettes alone, avoiding the need for complex priors.
Efficient Recursive Ray Tracer: The design of a differentiable mesh ray tracer implemented in CUDA/OptiX, which significantly reduces computational costs compared to CPU-based or pure neural rendering approaches, enabling high-quality joint optimization.
Material Modeling: Explicit modeling of internal absorption, a feature missing in most prior works (e.g., NeRRF, Nu-NeRF), allowing for the reconstruction of realistic objects like colored glass or resin.

4. Experimental Results

The method was evaluated on both synthetic datasets (NEMTO, Lyu et al.) and real-world captures (iPhone video).

Geometry Reconstruction:
- Metrics: Outperformed baselines (NeRO, Nu-NeRF, NeRRF) in Chamfer Distance (CD) and F1-score.
- Qualitative: Successfully reconstructed complex topologies (e.g., the "monkey" with internal voids, "horse" with complex surface details) where other methods produced surface roughness or filled voids incorrectly.
- IoR Prediction: The predicted IoR values were extremely close to ground truth (e.g., predicted 1.512 vs. GT 1.512 for the horse).
Relighting (Scene Editing):
- DiffTrans demonstrated superior relighting capabilities (changing environment maps) compared to baselines.
- Metrics: Achieved higher PSNR (23.17 vs. 19.64 for NeRO) and lower LPIPS in novel view synthesis under new lighting.
- Reasoning: By accurately recovering both geometry and materials (including absorption), the system can correctly simulate how light bends and attenuates under new illumination, whereas methods ignoring absorption or refraction fail to produce realistic relighting.
Ablation Studies:
- Confirmed the necessity of Tone Regularization for stable absorption optimization.
- Demonstrated robustness to mask noise and varying IoR settings.

5. Significance

DiffTrans represents a significant advancement in inverse rendering for transparent objects. By moving beyond surface-only or ideal-transparency assumptions, it enables the reconstruction of real-world transparent objects that possess complex internal structures and absorption properties. The implementation of a CUDA-based differentiable ray tracer bridges the gap between physical accuracy and computational efficiency, making high-fidelity reconstruction of transparent scenes feasible. This capability opens new avenues for applications in digital twins, augmented reality (AR), and visual effects, where accurate representation of glass, liquids, and gems is critical.