UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization

Imagine you are trying to clean a window that has been hit by a storm. Sometimes it's just a few long streaks of rain (like on a sunny day). Sometimes it's a messy splash of droplets (like on a car windshield). Sometimes it's dark and foggy (nighttime), and sometimes the rain is mixed with snow or haze.

For years, computer scientists built "cleaning robots" that were experts at only one of these specific problems. If you had a robot trained only on daytime streaks, it would fail miserably if you showed it a night scene with raindrops. You'd need a different robot for every single weather scenario, which is slow, expensive, and clumsy.

The paper you shared introduces UniRain, a new "Super Cleaning Robot" that can handle all these messy weather conditions at once. Here is how it works, explained through simple analogies:

1. The Problem: The "Bad Data" Buffet

Imagine you want to teach a chef how to cook a perfect steak.

The Old Way: You give the chef a giant buffet containing 2,000,000 plates of food. Some are perfect steaks, but many are burnt, raw, or covered in dirt. Because the buffet is so huge but so messy, the chef gets confused. They might learn to cook the easy stuff (burnt toast) perfectly but fail at the hard stuff (perfect steak) because the bad data distracts them.
The UniRain Solution: Instead of feeding the chef the whole messy buffet, UniRain uses a Smart Filter (RAG-based Distillation).
- Think of this filter as a team of expert food critics (AI models) who look at every single plate in the buffet.
- They ask: "Is this a real, high-quality rainy scene? Or is it a fake, blurry, or low-quality mess?"
- They throw away the bad plates and keep only the best, most realistic samples.
- Result: The chef (the AI model) now learns from a small, high-quality menu instead of a massive, confusing pile of junk. This makes the chef much smarter and more adaptable.

2. The Training: The "Fair Coach"

Now that the chef has good ingredients, they need to learn how to cook them. But here's the catch: some dishes are easy to learn (like frying an egg), while others are very hard (like making a soufflé).

The Problem: If you use the same training schedule for everything, the chef gets bored with the easy stuff and stops trying to learn the hard stuff. They become great at eggs but terrible at soufflés.
The UniRain Solution: They use a Multi-Objective Reweighted Optimization strategy.
- Imagine a coach who watches the chef's progress.
- If the chef is getting really good at "Daytime Rain" (the easy task) too fast, the coach says, "Okay, you're doing great, let's slow down on that and focus more energy on 'Nighttime Raindrops' (the hard task)."
- The coach constantly adjusts the difficulty and attention, ensuring the chef doesn't get lazy on the hard stuff or frustrated on the easy stuff. This keeps the learning balanced and strong across all scenarios.

3. The Brain: The "Specialized Team" (MoE)

Finally, the actual cleaning robot needs a brain architecture that can handle different types of mess.

The Old Way: Using one giant brain to try to do everything at once. It's like asking one person to be a painter, a sculptor, and a architect simultaneously. They might get overwhelmed.
The UniRain Solution: They built an Asymmetric Mixture-of-Experts (MoE) system.
- The Encoder (The "Scanner"): This part uses a Soft-MoE. Imagine a team of detectives who all look at the crime scene (the rainy image) and share their thoughts gently. They all contribute a little bit to understand the general vibe of the rain.
- The Decoder (The "Fixer"): This part uses a Hard-MoE. Imagine a team of specialized repairmen. When they see a specific type of damage (like a big raindrop), they don't ask everyone to help. Instead, they instantly pick the top 1 or 2 experts who are best at fixing that specific problem and let them do the heavy lifting.
- Result: The robot is flexible enough to understand the whole scene but efficient enough to call in the exact specialist needed to fix the specific problem.

Why This Matters

Before UniRain, if you wanted to clean a rainy video for a self-driving car, you might need to switch between different software models depending on whether it was day or night, or if it was raining or snowing.

UniRain is the "Swiss Army Knife" of image cleaning.

It filters out the junk from the internet to learn only from the best examples.
It balances its training so it doesn't ignore difficult weather conditions.
It uses a smart team of specialists to fix the image efficiently.

The result? A single model that can take a messy, rainy, dark, or snowy photo and make it look crystal clear, outperforming all the previous "specialized" robots. It's like having one master cleaner who can handle any weather, anywhere, anytime.

1. Problem Statement

While significant progress has been made in single-image deraining, existing methods suffer from two major limitations:

Lack of Generalization: Most models are designed for specific rain degradation types (e.g., only rain streaks, only raindrops, or only nighttime scenes). They fail to generalize when applied to diverse, complex real-world scenarios containing mixed degradation types.
Training Imbalance and Data Quality: Simply merging all available synthetic and real-world rain datasets (>2 million pairs) leads to suboptimal performance. Public datasets vary significantly in quality (resolution, realism, background), causing inaccurate supervisory signals. Furthermore, different rain types exhibit distinct convergence rates during training; a single optimization objective causes the model to overfit "easier" degradations (e.g., nighttime streaks) while under-optimizing complex ones (e.g., daytime raindrops), leading to uneven restoration quality.

2. Methodology

The authors propose UniRain, a unified framework comprising three core components:

A. RAG-based Dataset Distillation Pipeline

To address data quality issues, the authors construct a pipeline to distill high-quality training samples from massive public datasets.

Retrieval Stage: A database is built using real-world rainy images. For each query image, the system retrieves relevant real-world references using a hierarchical similarity matching process involving:
1. Semantic Similarity: Using CLIP text encoders on image captions.
2. Visual Feature Similarity: Using CLIP visual encoders.
3. Structural Similarity: Using SSIM for fine-grained consistency.
Generation Stage: The retrieved references and the query image are fed into Vision-Language Models (VLMs) (specifically an ensemble of InternVL2.5, LLaVA-NeXT, and MobileVLM). The VLMs act as quality assessors to determine if a sample is "True" (reliable) or "Fake" (low quality/noisy).
Outcome: This process filters millions of raw images down to a reliable "distilled" dataset (approx. 2.6% of original size) that closely mimics real-world distributions, forming a "data-quality pyramid."

B. Asymmetric Mixture-of-Experts (MoE) Architecture

To handle diverse degradation patterns efficiently, UniRain employs an asymmetric encoder-decoder structure:

Soft-MoE Encoder: Uses continuous routing weights (soft routing) to adaptively combine multiple experts. This allows the encoder to collaboratively preserve diverse degradation cues from different rain types.
Hard-MoE Decoder: Uses Top- $k$ routing (hard routing) to selectively activate the most relevant experts. This focuses computational resources on reconstructing fine textures and structural details, complementing the encoder's broad feature extraction.

C. Multi-objective Reweighted Optimization

To solve the training imbalance where different rain types converge at different rates, the authors introduce a dynamic reweighting strategy:

Convergence Slope Estimation: The system estimates the convergence rate of each rain type (Daytime/Nighttime Streaks/Drops) using linear regression on the loss curve within a sliding window.
Adaptive Reweighting: Three metrics are calculated to adjust loss weights dynamically:
1. Type Balance Score (TBS): Assigns higher weights to slower-converging types to synchronize convergence.
2. Type Stability Score (TSS): Evaluates historical stability to penalize diverging types.
3. Adaptivity Factor (AF): Controls the transition between balancing convergence and maintaining stability based on the global divergence state.
Result: This ensures the model learns all rain types simultaneously without bias toward easier tasks.

3. Key Contributions

Unified Framework: Proposed UniRain, the first effective framework capable of simultaneously handling rain streaks and raindrops under both daytime and nighttime conditions within a single model.
RAG-based Distillation: Developed an intelligent pipeline using Retrieval Augmented Generation (RAG) and VLMs to filter and distill high-quality, reliable training data from massive, heterogeneous public datasets.
Optimization Strategy: Introduced a Multi-objective Reweighted Optimization strategy that dynamically balances the learning of different rain degradation types, preventing task imbalance.
Asymmetric MoE: Designed a novel architecture combining Soft-MoE (for feature preservation) and Hard-MoE (for detail reconstruction) to balance expressiveness and efficiency.

4. Experimental Results

The method was evaluated on the proposed RainRAG benchmark and multiple public real-world datasets (RealRain-1k, RainDS-real, WeatherBench).

Quantitative Performance:
- On the RainRAG dataset, UniRain achieved an average PSNR of 28.93 dB, outperforming state-of-the-art (SOTA) models like Restormer (27.89 dB) and URIR (27.91 dB).
- On Real-world benchmarks, it achieved an average PSNR of 29.42 dB, surpassing the previous best (URIR) by 1.73 dB.
- It showed significant improvements in specific subsets, e.g., +1.41 dB PSNR on Daytime Raindrops (DRD) compared to NeRD-Rain.
Qualitative Performance: Visual comparisons demonstrate superior removal of complex rain streaks and raindrops, with better preservation of background details and fewer artifacts compared to competitors.
Generalization: The model demonstrated strong generalization across diverse scenarios, including autonomous driving, UAV, and maritime scenes.
Efficiency: Despite high performance, UniRain maintains competitive complexity with 126.54 GFLOPs and 24.39M parameters, which is lower than several SOTA models (e.g., DRSformer has 220 GFLOPs).
Extension: The framework was successfully extended to "all-in-one" weather restoration (handling rain, snow, and haze), outperforming specialized weather restoration models.

5. Significance

This paper addresses a critical bottleneck in low-level vision: the gap between specialized models and the need for a universal, robust solution for real-world applications.

Data-Centric Innovation: By leveraging RAG and VLMs for dataset distillation, it offers a new paradigm for handling the "noise" in large-scale public datasets, proving that data quality is more critical than sheer volume.
Unified Solution: It eliminates the need for switching between multiple models for different weather conditions, significantly improving deployment efficiency for intelligent systems (e.g., self-driving cars, surveillance).
Training Stability: The multi-objective reweighted optimization provides a generalizable strategy for training models on heterogeneous tasks with conflicting convergence behaviors, applicable beyond just image deraining.

UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization

1. The Problem: The "Bad Data" Buffet

2. The Training: The "Fair Coach"

3. The Brain: The "Specialized Team" (MoE)

Why This Matters

1. Problem Statement

2. Methodology

A. RAG-based Dataset Distillation Pipeline

B. Asymmetric Mixture-of-Experts (MoE) Architecture

C. Multi-objective Reweighted Optimization

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Founder effects shape the evolutionary dynamics of multimodality in open LLM families

From Instructions to Assistance: a Dataset Aligning Instruction Manuals with Assembly Videos for Evaluating Multimodal LLMs

Causal Direct Preference Optimization for Distributionally Robust Generative Recommendation

Graphs RAG at Scale: Beyond Retrieval-Augmented Generation With Labeled Property Graphs and Resource Description Framework for Complex and Unknown Search Spaces

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search