Polarization Uncertainty-Guided Diffusion Model for Color Polarization Image Demosaicking

The Big Picture: What is the problem?

Imagine you are trying to take a photo of a shiny car or a wet street. Standard cameras just see the light (brightness and color). But Polarization Cameras are like "super-eyes." They can see the direction the light waves are vibrating. This helps them see through glare, identify materials, and even see 3D shapes better.

However, these cameras have a flaw. To capture this special "direction" info, they use a filter over the lens that acts like a mosaic puzzle. Instead of capturing a full, clear picture, the camera only grabs tiny, scattered pieces of the puzzle (some pixels see 0°, some 45°, some 90°, etc.).

The Challenge: To get a full picture, a computer has to guess what the missing pieces look like. This is called Demosaicking.

The Old Way: Previous AI methods were good at guessing the brightness (making the image look bright and clear), but they were terrible at guessing the direction (the polarization). It's like an artist who can paint a perfect landscape but gets the shadows and angles completely wrong. The result looks pretty, but the physics are broken.
The Data Problem: These AI models were trained on very small, boring datasets. They didn't see enough variety in the real world, so they got stuck in a "performance ceiling."

The Solution: PUGDiff (The "Two-Brain" System)

The authors created a new system called PUGDiff. Think of it as a team of two experts working together, guided by a smart manager who knows when to trust whom.

1. The "Base Branch" (The Fact-Checker)

Role: This is a standard AI trained from scratch on the specific camera data.
Strength: It is incredibly accurate with the raw numbers. It knows exactly how bright a pixel should be.
Weakness: It gets confused when the scene is complex or the data is missing, leading to blurry or wrong polarization angles.
Analogy: Think of this as a strict accountant. They are great at adding up numbers and keeping the books balanced, but they might not have a good "gut feeling" for the big picture.

2. The "SD Branch" (The Creative Artist)

Role: This branch uses a massive, pre-trained AI called Stable Diffusion (the same tech behind DALL-E or Midjourney).
Strength: It has "seen" millions of natural images. It has a huge library of "common sense" about how light, textures, and objects usually look. It can fill in missing gaps with high-quality, realistic details.
Weakness: Because it's trained on general photos, it might "hallucinate" or smooth out details too much if left unchecked. It's not a perfect accountant.
Analogy: Think of this as a famous painter. They can imagine a beautiful, realistic scene even if they only see a few clues, but they might take artistic liberties that aren't mathematically precise.

3. The "Uncertainty Manager" (The Smart Switch)

This is the secret sauce of the paper. The system doesn't just average the two outputs; it asks a critical question: "How sure are we about this specific part of the image?"

Low Uncertainty (The "Safe Zone"): If the accountant (Base Branch) is confident the numbers are right, the system says, "Great, let's use the accountant's version." This keeps the image sharp and mathematically accurate.
High Uncertainty (The "Danger Zone"): If the accountant is confused (e.g., a complex reflection or a tricky texture), the system says, "We don't trust the numbers here. Let's ask the painter (SD Branch) to use their imagination to fix the polarization angles."

The Magic: The system uses a mathematical "uncertainty map" to decide, pixel by pixel, which expert to listen to. It's like a conductor leading an orchestra, knowing exactly when to let the violin soloist shine and when to bring in the brass section to fix a weak note.

Why is this a big deal?

Breaking the Data Bottleneck: Usually, AI needs millions of specific photos to learn. This method "borrows" knowledge from a giant AI that already knows everything about natural images (Stable Diffusion) and teaches it just a little bit about polarization. It's like hiring a world-class chef and teaching them how to cook one specific dish, rather than trying to train a chef from scratch using only one recipe book.
Fixing the "Glare": The results show that this method removes glare and reflections much better than before. In the paper's tests, they used the new images to remove reflections from windows and car windshields, revealing clear text and details that other methods missed.
Visual Perfection: The final images aren't just mathematically correct; they look real to the human eye. The polarization angles (which tell us about surface materials) are reconstructed with high fidelity.

Summary Analogy

Imagine you are trying to restore an old, torn map.

Old AI: You have a robot that can perfectly trace the straight lines of the roads (Intensity), but it guesses the mountains and rivers (Polarization) wrong, making the map useless for navigation.
PUGDiff: You have a Robot (Base Branch) that traces the roads perfectly. You also have a Cartographer (SD Branch) who has memorized every map in the world.
The Manager: A smart supervisor looks at the torn map. Where the roads are clear, the Robot draws them. Where the map is torn and the roads are missing, the Supervisor asks the Cartographer to use their vast knowledge to guess what the mountains and rivers should look like.

The result? A map that is both mathematically accurate and visually complete, allowing you to navigate the world (or remove reflections) with perfect clarity.

1. Problem Statement

Color Polarization Demosaicking (CPDM) is the task of reconstructing full-resolution polarization images (specifically the four polarization directions: $0^\circ, 45^\circ, 90^\circ, 135^\circ$ ) from raw mosaic images captured by a Division-of-Focal-Plane (DOFP) camera.

The Challenge: Existing network-based methods struggle to accurately reconstruct polarization characteristics, specifically the Degree of Polarization (DOP) and Angle of Polarization (AOP), even when they recover scene intensity ( $S_0$ ) well.
Root Cause: Current methods rely on limited-scale simulated datasets. These datasets lack scene diversity and scale, leading to insufficient data priors. Consequently, neural networks fail to generalize to complex scenarios, resulting in significant errors in polarization property reconstruction.

2. Methodology: PUGDiff

The authors propose PUGDiff, a dual-branch network guided by a Polarization Uncertainty Model. The framework integrates a task-specific base branch with a pre-trained Text-to-Image (T2I) diffusion model to leverage external priors.

A. Dual-Branch Architecture

Base Branch ( $f_b$ ):
- Architecture: A CNN-Transformer hybrid U-Net trained from scratch.
- Function: Provides fundamental demosaicking capabilities and ensures high fidelity for intensity reconstruction.
- Output: $x_b$ (Initial reconstruction of the 4 polarization directions).
SD Branch ( $f_{sd}$ ):
- Architecture: Based on Stable Diffusion (SD).
- Adaptation: Uses Low-Rank Adaptation (LoRA) to fine-tune the Variational Autoencoder (VAE) and the Diffusion U-Net. Text encoders and cross-attention modules are removed to improve efficiency as text prompts are unnecessary.
- Function: Leverages the diffusion prior learned from large-scale natural images to recover missing pixels and correct polarization errors, particularly in complex regions.
- Output: $x_{sd}$ (Refined reconstruction).

B. Polarization Uncertainty Model

To effectively fuse the outputs of the two branches, the authors explicitly model polarization uncertainty ( $\eta_p$ ).

Theoretical Basis: The intensity uncertainty ( $\eta$ ) of the reconstructed pixels is propagated through the non-linear Stokes parameter calculations to derive the uncertainty of the DOP. The DOP is modeled as following a Rice distribution.
Estimation Network: A dedicated network (sharing the backbone of the base branch) predicts the log-polarization uncertainty ( $s = \ln \eta_p$ ) via supervised learning, minimizing the negative log-likelihood of the Rice distribution.
Role: The uncertainty map acts as a spatial guide. High uncertainty indicates regions where the base branch likely fails to reconstruct polarization properties accurately.

C. Uncertainty-Guided Fusion

The final output ( $x_{final}$ ) is generated by adaptively weighting the base branch and the SD branch based on the predicted uncertainty map.

Low Uncertainty Regions: The Base Branch is favored to maintain high fidelity and avoid over-smoothing.
High Uncertainty Regions: The SD Branch is favored to correct polarization errors and enhance visual faithfulness.
Loss Function: An Uncertainty-Guided Loss is used during training. It incorporates the normalized uncertainty ( $\bar{s}$ ) as a gating mechanism to weight the Mean Squared Error (MSE) loss between the final output and the respective branch outputs. This allows the network to learn how to balance the contributions of the two branches dynamically.

3. Key Contributions

Integration of Diffusion Priors: The paper is the first to introduce a Text-to-Image diffusion model (via LoRA) into the CPDM task. This breaks the performance bottleneck caused by the scarcity of polarization training data by leveraging priors from large-scale natural image distributions.
Polarization Uncertainty Modeling: The authors propose a novel method to explicitly model uncertainty based on the statistical properties of polarization (DOP) rather than just intensity. This uncertainty is transformed into a guidance signal for network fusion.
Adaptive Dual-Branch Fusion: A framework that adaptively selects the dominant branch (Base vs. SD) based on local polarization uncertainty, ensuring high fidelity in simple regions and high perceptual quality in complex regions.
State-of-the-Art Performance: The method achieves superior results in both quantitative metrics and qualitative visual perception compared to existing methods.

4. Experimental Results

The method was evaluated on simulated datasets (Monno, Qiu, PIDSR, DCPM) and real-world captured images.

Quantitative Performance:
- PUGDiff achieved SOTA performance across all metrics (PSNR, SSIM, MAE) on multiple datasets.
- Notable improvements were seen in DOP and AOP metrics, which are the most challenging aspects of CPDM. For example, on the PIDSR dataset, PUGDiff achieved a PSNR of 40.6696 for DOP, outperforming the next best method (PIDSR) by a significant margin.
Qualitative Performance:
- Visual comparisons show that PUGDiff produces sharper edges and fewer artifacts in AOP and DOP maps compared to competitors.
- In real-world scenarios, the method effectively handles noise and preserves texture details where other methods fail.
Ablation Studies:
- Uncertainty Type: Modeling uncertainty specifically from the polarization perspective (DOP) yielded better results than using intensity-based uncertainty.
- SD Configuration: Using LoRA on both the VAE and U-Net with a rank of 4 provided the best balance between performance and stability. Full fine-tuning failed due to data scarcity.
- Fusion: The uncertainty-guided fusion was proven critical; removing it or using fixed weights degraded performance.

5. Significance

This work addresses a critical limitation in polarization imaging: the lack of large-scale, diverse training data. By transferring the powerful generative priors of diffusion models to the specific domain of polarization demosaicking, PUGDiff overcomes the generalization limits of traditional supervised learning.

Practical Impact: The ability to accurately reconstruct DOP and AOP enables better performance in downstream applications such as reflection removal, material classification, and 3D reconstruction.
Methodological Insight: The paper demonstrates that explicitly modeling task-specific uncertainty (in this case, polarization error) is a robust strategy for guiding the fusion of specialized networks with general-purpose generative models.