SLER-IR: Spherical Layer-wise Expert Routing for All-in-One Image Restoration

Imagine you have a broken camera. Sometimes the lens is dirty (blur), sometimes it's raining outside (rain streaks), sometimes it's too dark (low-light), and sometimes the image is grainy (noise).

In the past, fixing these problems was like having a toolbox where you needed a different, specialized tool for every single problem. If you had a blurry photo, you grabbed the "blur fixer." If it was rainy, you grabbed the "rain remover." But what if your photo was blurry and rainy? You'd have to use two tools, or worse, try to guess which one to use. This is slow, inefficient, and often leads to messy results.

Recently, scientists tried to build a "Swiss Army Knife" camera repairman—one single AI model that could handle any problem. But these "all-in-one" models had a flaw: they were like a general practitioner who knows a little bit about everything but isn't great at anything specific. They often got confused, mixing up the instructions for fixing rain with the instructions for fixing blur, resulting in a photo that still looked a bit muddy.

Enter SLER-IR: The "Smart Team of Specialists" with a Magic Compass.

The authors of this paper propose a new system called SLER-IR. Instead of one generalist, they built a team of specialists who work together dynamically. Here is how it works, using simple analogies:

1. The Layer-by-Layer Assembly Line (Spherical Layer-wise Expert Routing)

Imagine a photo restoration process as an assembly line with 8 stations (layers).

Old Way: Every station had the same worker trying to do everything.
SLER-IR Way: At each of the 8 stations, there are 3 different specialists waiting.
- Station 1 might need a "Texture Expert."
- Station 5 might need a "Color Expert."
- Station 8 might need a "Detail Expert."

The magic is that the system doesn't just pick one specialist for the whole job. It looks at the photo at every single station and asks: "Who is best for this specific step?"

If the photo is rainy, Station 1 might pick the "Rain Specialist," but Station 5 might pick the "Blur Specialist" because the rain streaks are also blurring the background.
This creates thousands of unique paths through the factory. The system builds a custom repair plan for every single photo on the fly.

2. The Magic Compass (Spherical Uniform Degradation Embedding)

How does the system know which specialist to pick? It needs a way to "read" the damage.

The Problem: Imagine trying to navigate a city using a map where the distances are distorted. If "Rain" and "Blur" look too close together on the map, the driver (the AI) gets confused and picks the wrong route. This is what happens with standard AI math (linear spaces).
The Solution: The authors put the damage types onto a globe (a hypersphere).
- Think of the globe like a basketball. They place "Rain," "Blur," "Darkness," and "Noise" as pins stuck into the ball.
- They use a special math trick to ensure these pins are evenly spaced all over the ball, so no two types are accidentally too close.
- When a new photo comes in, the system draws a line from the center of the ball to the photo's "damage pin." It then simply looks for the nearest specialist pin on the surface. Because the pins are perfectly spaced, the choice is always clear and accurate. This prevents the AI from getting confused.

3. The Big Picture vs. The Close-Up (Global-Local Granularity Fusion)

Sometimes, a photo has a problem in just one corner (like a raindrop on the left side) but is fine on the right.

The Problem: If the AI only looks at the "whole picture" (Global), it might miss the raindrop. If it only looks at tiny "patches" (Local), it might lose the context of the whole scene. Also, during training, the AI only sees small squares, but in real life, it sees the whole image. This causes a "mismatch."
The Solution: The system uses a two-sense approach.
- Sense 1 (The Tourist): Looks at the whole image to understand the general vibe (e.g., "It's a dark forest").
- Sense 2 (The Detective): Zooms in to find specific clues (e.g., "There is a rain streak here").
- The system combines these two senses to create a guide map. It tells the specialists exactly where to work and what to fix, ensuring that a rain streak in the corner gets fixed without messing up the clear sky in the center.

The Result

By combining these three ideas, SLER-IR acts like a highly organized, adaptable team of experts who communicate perfectly.

They don't get confused by mixed problems (like rain + blur).
They know exactly who to call at every step of the process.
They look at both the big picture and the tiny details.

In short: While other methods try to force one brain to do everything, SLER-IR builds a smart, flexible team that changes its strategy instantly based on the specific damage, resulting in crystal-clear photos even in the worst conditions. The experiments in the paper show that this new "team" beats all the previous "generalists" in both speed and quality.

1. Problem Statement

Image restoration aims to reconstruct high-quality images from degraded inputs (e.g., noise, rain, haze, blur). While traditional single-task models perform well on specific degradations, they lack flexibility for real-world scenarios involving complex, mixed, or unseen degradations.

All-in-One (AiO) frameworks attempt to unify these tasks into a single model but face two primary limitations:

Feature Interference: Methods using feature modulation (e.g., prompts) often suffer from conflicting objectives within a shared backbone, hindering fine-grained modeling for specific degradations.
Suboptimal Expert Specialization: Existing Mixture-of-Experts (MoE) approaches often restrict experts to local modules or use linear embedding spaces for routing. Linear spaces introduce geometry bias (uneven class distances), leading to unstable routing decisions and failure to handle composite degradations effectively.
Granularity Gap: Most methods struggle with spatially non-uniform degradations (e.g., localized rain streaks) and the mismatch between patch-based training and full-resolution inference.

2. Methodology: SLER-IR

The authors propose SLER-IR, a unified framework featuring Spherical Layer-wise Expert Routing. The architecture consists of four core components:

A. Spherical Layer-wise Expert Routing (SLER)

Architecture: The standard encoder-decoder backbone is transformed so that each layer contains multiple parameter-independent expert nodes (e.g., 3 candidates per layer).
Compositional Inference: Instead of activating all experts, the network dynamically selects a specific expert path for each layer based on input degradation. With $K$ layers and $C$ experts per layer, the model can compose up to $C^K$ unique inference paths (e.g., $3^8 = 6,561$ paths).
Two-Stage Training:
1. Stage 1 (Probabilistic Routing): Uses soft selection to optimize the router and learn stable degradation representations.
2. Stage 2 (Deterministic Routing): Freezes the router and uses hard selection (argmax) to refine expert specialization and reconstruction quality.

B. Router Design & Similarity-based Gating

Input: A "Degraded CLIP" encoder extracts degradation cues, producing a routing vector.
Spherical Uniform Embedding: The routing vector is L2-normalized to map onto a unit hypersphere.
Gating Mechanism: Expert selection is based on cosine similarity between the routing vector and learned expert centers on the hypersphere. This replaces Euclidean distance (used in linear spaces) to eliminate geometry bias.

C. Hyperspherical Degradation Representation Learning

Motivation: Linear embedding spaces often suffer from "class-distance bias," where unrelated degradations are artificially pushed apart or merged, confusing the router.
Solution: The authors map degradation representations onto a hypersphere and optimize them using a Triplet-Constrained Contrastive Loss.
- This ensures expert centers are mutually dissimilar (uniformly distributed on the sphere).
- It sharpens angular decision boundaries, ensuring routing decisions reflect true degradation affinity rather than embedding artifacts.

D. Global–Local Granularity Fusion (GLGF)

Goal: To address spatially non-uniform degradations and the gap between training crops and full-resolution inference.
Components:
1. GLMC (Global–Local Map Construction): Generates a Content Semantic Patch (CSP) map (calibrated by global CLS tokens) and a Degradation Severity Patch (DSP) map (derived from local crop-wise degradation tokens).
2. CGDF (Content-Guided Degradation Fusion): Fuses CSP and DSP via cross-attention to create a "restoration prior map."
Mechanism: This prior map is injected into the restoration backbone, allowing the model to adaptively aggregate features based on local degradation severity while maintaining global scene consistency.

3. Key Contributions

SLER Framework: A novel all-in-one architecture that enables compositional inference through layer-wise dynamic expert activation, allowing the model to adapt to complex, composite degradations without increasing inference overhead.
Spherical Uniform Degradation Embedding: A novel routing strategy that maps degradation representations onto a hypersphere using contrastive learning. This mitigates geometry bias in linear spaces, significantly improving the separability and robustness of routing decisions.
Global–Local Granularity Fusion (GLGF): A module that bridges the granularity gap between patch-based training and full-image inference, providing robust, spatially-aware guidance for non-uniform degradations.

4. Experimental Results

The method was evaluated on three-task (Dehazing, Deraining, Denoising) and five-task (adding Deblurring, Low-light) benchmarks.

Quantitative Performance:
- 3-Task Setting: SLER-IR achieved an average PSNR of 33.14 dB and SSIM of 0.922, outperforming the previous state-of-the-art (MoCE-IR) by 0.41 dB.
- 5-Task Setting: It achieved an overall average of 31.73 dB / 0.928, surpassing the second-best method by 1.15 dB.
- Notable gains were observed in Dehazing (+2.59 dB over MoCE-IR on SOTS) and Deblurring (+1.22 dB on GoPro).
Qualitative Performance: Visual comparisons show SLER-IR produces cleaner results with better color reconstruction, fewer artifacts, and sharper edges compared to competitors, particularly in low-light and rainy conditions.
Ablation Studies:
- Removing the hyperspherical contrastive loss significantly dropped performance, confirming its role in routing stability.
- Removing GLGF reduced performance, validating the need for global-local fusion.
- 3 experts per layer was found to be the optimal balance between specialization and computational cost.

5. Significance

SLER-IR represents a significant advancement in unified image restoration by addressing the fundamental limitations of existing MoE and prompt-based methods.

Theoretical Insight: It demonstrates that hyperspherical geometry is superior to linear spaces for degradation representation, solving the "geometry bias" problem that plagues similarity-based routing.
Practical Impact: By enabling exponential path diversity through layer-wise routing, the model achieves high specialization for complex, real-world degradations without the computational penalty of activating all parameters.
Robustness: The integration of global context and local degradation cues makes the model highly effective in scenarios where degradations are spatially non-uniform, a common challenge in real-world photography.

The code and models are set to be publicly released, facilitating further research in all-in-one restoration.