Learning Latent Transmission and Glare Maps for Lens Veiling Glare Removal

Imagine you have a brand-new, ultra-thin camera lens designed for your smartphone or a VR headset. It's small, cheap, and perfect for making devices portable. But there's a catch: because it's so simple, it doesn't take perfect pictures.

This paper tackles two specific problems that ruin photos from these "simplified" lenses:

The Blur (Aberration): The lens isn't perfectly shaped, so the image looks a bit fuzzy or distorted, like looking through a warped piece of glass.
The Haze (Veiling Glare): This is the tricky part. Imagine you're taking a photo on a sunny day, but the sun isn't even in the frame. Yet, your photo looks washed out, gray, and low-contrast, as if someone put a dirty, foggy sheet over the lens. This is Veiling Glare. It's caused by tiny imperfections inside the lens that scatter light everywhere, creating a "veil" over the image.

The Problem: A Double Whammy

Most existing software is good at fixing the blur (like sharpening a blurry photo) or good at fixing haze (like removing fog from a landscape). But when you have both happening at once, they confuse each other.

If you try to fix the blur, the software might make the haze worse.
If you try to remove the haze, you might accidentally blur the details.
The Big Hurdle: To teach a computer to fix this, you usually need thousands of "Before" and "After" photos. But in the real world, you can't easily take a perfect photo and then simultaneously take a "hazy" version of the exact same scene with the exact same lighting. It's like trying to teach someone how to fix a car crash by only showing them the wreckage, never the car before it crashed.

The Solution: A Two-Part Magic Trick

The authors, led by Xiaolong Qian and Kaiwei Wang, created a system called DeVeiler (De-veiling). They solved the "no data" problem with a clever two-step process.

Step 1: The "Fake It Till You Make It" Generator (VeilGen)

Since they couldn't find enough real-world examples of "perfect photo + hazy photo," they built a Generative AI (called VeilGen) to create them.

The Analogy: Imagine you want to teach a chef how to make a perfect cake, but you don't have any flour. Instead, you build a machine that simulates how flour turns into a cake.
How it works: VeilGen looks at a hazy photo and tries to guess the "secret recipe" of the haze. It estimates two invisible maps:
1. The Transmission Map: Where the light is getting blocked (the "fog").
2. The Glare Map: Where the stray light is bouncing around.
Once it understands these maps, it takes a clean photo and artificially adds the haze using those maps. Now, it has a perfect "Before and After" pair! It does this thousands of times, creating a massive training dataset that never existed before.

Step 2: The "Undo" Button (DeVeiler)

Now that they have a massive library of fake "Before and After" pairs, they train the main repair network, DeVeiler.

The Analogy: Think of DeVeiler as a master detective who has studied thousands of crime scenes (the hazy photos) and knows exactly how the criminal (the glare) operates.
The Secret Sauce: Instead of just guessing what the clean photo looks like, DeVeiler uses the same "secret maps" (Transmission and Glare) that VeilGen used to create the mess. It essentially says, "Okay, I know exactly how this haze was added, so I will run the process in reverse to remove it."
The Reversibility Check: To make sure it's not just hallucinating, the system checks its own work. It takes the "clean" photo it just restored, adds the haze back in using its own maps, and checks if it matches the original hazy photo. If it matches, the restoration is correct.

Why This Matters

This isn't just about making phone photos look better. It's about enabling tiny, cheap cameras to work in the real world.

AR/VR Headsets: These devices need lenses so thin they are almost flat. This method allows them to take clear, high-contrast images without expensive, bulky glass.
Medical Endoscopes: Tiny cameras inside the body often suffer from glare. This tech could help doctors see clearer.
Drones and Robots: Small, lightweight cameras can now see better in complex lighting.

In a Nutshell

The authors realized that simplified lenses create a unique "double trouble" of blur and haze that old software can't fix. They solved the lack of training data by building an AI that learns the physics of the glare to fake realistic training data. Then, they built a second AI that uses that physics knowledge to reverse-engineer the glare out of real photos.

It's like teaching a robot to clean a window by first teaching it exactly how the dirt gets there, so it knows exactly how to wipe it off.

Here is a detailed technical summary of the paper "Learning Latent Transmission and Glare Maps for Lens Veiling Glare Removal" (DeVeiler).

1. Problem Statement

The paper addresses a critical challenge in simplified optical systems (e.g., single-lens, metalens, and compact AR/VR optics): the compound degradation caused by the simultaneous presence of optical aberrations and veiling glare.

Optical Aberrations: Spatially varying blur caused by design trade-offs in simplified lenses.
Veiling Glare: A diffuse, low-contrast veil caused by stray-light scattering from non-ideal surfaces and coatings. Unlike structured artifacts (like lens flare or ghosting), veiling glare is depth-independent and reduces global image contrast.
The Core Challenge: Existing methods fail to handle this compound degradation effectively.
- Computational Aberration Correction (CAC) models cannot recover contrast lost to scattering.
- Dehazing/Flare Removal models rely on physics (e.g., atmospheric scattering) that do not match the internal lens scattering of veiling glare.
- Data Scarcity: Generating realistic paired training data (Clean + Degraded) is nearly impossible. Physical simulation requires prohibitive non-sequential ray-tracing, and real-world capture of ground-truth clean images under veiling glare conditions is infeasible.

2. Methodology

The authors propose a two-stage framework: VeilGen (for data synthesis) and DeVeiler (for image restoration), unified by a shared Latent Optical Transmission and Glare Map representation.

A. VeilGen: Physics-Informed Generative Model

VeilGen is a Stable Diffusion (SD)-based model designed to synthesize realistic paired data (Clean $\to$ Compound Degraded) without requiring physical ground truth.

Latent Optical Transmission and Glare Map Predictor (LOTGMP): Instead of treating the diffusion process as a black box, VeilGen includes a module that estimates two latent maps during the denoising process:
1. Transmission Map ( $T$ ): Represents local contrast attenuation.
2. Glare Map ( $G$ ): Represents the additive stray light.
Veiling Glare Imposition Module (VGIM): These estimated maps are injected into the diffusion process to modulate image features, strictly following the physical degradation model:
$I_{de} = (I_c \otimes K) \cdot T + G$
Where $K$ is the Point Spread Function (aberration), $T$ is transmission, and $G$ is glare.
Hybrid Training: VeilGen is trained on a source domain (paired aberration-only data) and a target domain (unpaired degraded images). It uses the LOTGMP to infer physical maps for the target domain, allowing it to synthesize realistic compound degradations on clean images.

B. DeVeiler: Reversible Restoration Network

DeVeiler is the restoration network trained to invert the degradation process.

Reversibility Constraint: Instead of learning a blind mapping, DeVeiler is trained to learn the inverse of the forward degradation.
Distilled Degradation Net (DDN): Since running the multi-step VeilGen diffusion process during restoration training is too slow, the authors distill VeilGen's behavior into a lightweight, single-step forward model (DDN).
Veiling Glare Compensation Module (VGCM): DeVeiler contains a module that predicts the latent transmission and glare maps from the degraded input. It uses these maps to modulate features inversely to the VGIM in VeilGen.
Training Objective: The network is optimized using a composite loss:
1. Reconstruction Loss ( $L_{rec}$ ): Ensures the output matches the ground truth.
2. Reversibility Loss ( $L_{rev}$ ): Enforces cycle consistency. If DeVeiler predicts maps $\hat{c}_{vg}$ , the DDN must be able to re-synthesize the original degraded image using those maps. This forces the network to learn physically meaningful representations rather than statistical correlations.

3. Key Contributions

VeilGen: A novel physics-informed generative model that learns to simulate veiling glare by estimating latent optical transmission and glare maps, enabling the creation of realistic paired datasets for compound degradation.
DeVeiler: A restoration network trained with a reversibility constraint. It utilizes a bidirectional structure (VGIM in generation, VGCM in restoration) to explicitly model and invert the physical scattering process.
Latent Map Learning: The introduction of LOTGMP allows the system to estimate physical parameters (transmission/glare) in an unsupervised manner, bridging the gap between unpaired real-world data and supervised restoration.
State-of-the-Art Performance: The framework achieves superior results on both simulated (Screen-Compound) and real-world (Realworld-Compound) datasets, outperforming cascaded pipelines and domain adaptation baselines.

4. Experimental Results

The method was evaluated on two distinct optical systems: a Large-aperture Single Lens (SL) and a Metasurface-Refractive Hybrid Lens (MRL).

Quantitative Performance:
- On the Screen-Compound domain (with Ground Truth), DeVeiler achieved the highest scores in PSNR (22.38 dB), SSIM (0.729), and lowest LPIPS (0.261), significantly outperforming cascaded methods (e.g., SwinIR + DiffDehaze) and domain adaptation baselines.
- On the Realworld-Compound domain (no Ground Truth), DeVeiler achieved top scores in no-reference metrics (CLIPIQA, Q-Align, NIQE), demonstrating robust generalization.
Visual Quality: Visual comparisons show that DeVeiler successfully removes the diffuse veil and restores color fidelity while preserving fine details. Competing methods often suffer from residual glare, color shifts, or over-smoothing.
Ablation Studies:
- Removing the LOTGMP or the Stable Diffusion prior significantly degraded performance, confirming the necessity of physical map estimation.
- The bidirectional (reversible) training paradigm was proven essential; unidirectional injection of latent maps failed to generalize.
Data Efficiency: The framework requires only ~15–25 unpaired target images to adapt to a new lens system, demonstrating high few-shot adaptation capabilities.

5. Significance

This work makes a significant contribution to computational imaging and optical design:

Enabling Compact Optics: It provides a computational solution to a major bottleneck in miniaturized optics (AR/VR, mobile, drones), allowing for high-quality imaging without expensive hardware coatings or complex mechanical baffles.
Physics-Informed AI: It moves beyond "black-box" deep learning by embedding physical optical models (scattering, transmission) directly into the generative and restoration pipelines.
Data-Scarce Solution: It offers a robust framework for training restoration models in scenarios where paired ground-truth data is impossible to acquire, a common issue in real-world optical engineering.

In summary, DeVeiler successfully decouples and reverses the complex, coupled effects of aberration and veiling glare by learning latent physical maps, setting a new standard for image restoration in simplified optical systems.