LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-Resolution

Imagine you have a very old, blurry, and scratched-up family photo. You want to restore it to look crisp and clear again. This is what Real-World Image Super-Resolution (Real-ISR) tries to do: take a low-quality, degraded image and turn it into a high-definition masterpiece.

In the past, computers were like cautious accountants: they tried to guess the missing pixels based on strict math. The result was often safe but looked a bit "plastic" or blurry.

Recently, we started using Generative AI (like the technology behind DALL-E or Midjourney). These are like creative artists. They don't just guess; they imagine what the missing details should look like. They can add realistic hair strands, fabric textures, and skin pores.

But here's the problem:
Because these AI artists are so creative, they sometimes get carried away. They might add a beautiful flower to a photo where there was only a bush, or change the shape of a person's nose to look more "perfect" but actually wrong. In the paper, they call this "hallucination." The image looks sharp and amazing, but it's no longer faithful to the original photo.

The big challenge is: How do you teach the AI to be creative without lying? And how do you check if it's lying if you don't have the original, perfect photo to compare it to?

Enter: LucidNFT

The authors of this paper built a new system called LucidNFT to solve this. Think of it as a strict but fair art critic who helps the AI artist improve. Here is how it works, broken down into three simple parts:

1. The "Truth Detector" (LucidConsistency)

Usually, to know if a restored photo is good, you need the original perfect photo to compare it against. But in the real world, you rarely have that.

The Analogy: Imagine you are trying to recognize a friend in a crowd, but they are wearing a heavy disguise (fog, blur, scratches). A normal camera might get confused.
The Solution: The authors created a special "Truth Detector" (called LucidConsistency). It's like a detective who ignores the disguise (the blur and scratches) and looks straight at the person's face (the semantic meaning). It checks: "Does this new, sharp image still look like the blurry original underneath?" If the AI adds a fake nose, the detector says, "No, that doesn't match the original face!"

2. The "Fair Scorecard" (Decoupled Advantage Normalization)

The AI generates many different versions of the photo (let's say 12 different guesses). Some look very sharp but fake; others look a bit blurry but true to the original.

The Problem: In the past, when computers tried to grade these 12 guesses, they would mash all the scores together into one big number. It was like grading a student on "Math" and "Art" by just adding the scores together. If the Math score was huge, it would drown out the Art score. The computer would only care about making things sharp and ignore whether they were truthful.
The Solution: The authors invented a Fair Scorecard. Instead of mixing the grades, they grade "Sharpness" and "Truthfulness" separately first, then combine them carefully. This ensures the AI doesn't get rewarded for being a liar just because it's good at making things look sharp. It forces the AI to find the perfect balance between "looking cool" and "being honest."

3. The "Training Gym" (LucidLR)

To teach the AI to be good at this, you need a lot of practice.

The Problem: Most AI training sets are like a gym with only one type of exercise machine. They are too perfect or too simple. The AI gets good at fixing those specific types of blurry photos but fails when faced with real-world messiness (like a photo taken in the rain or with a shaky hand).
The Solution: The authors built a massive new library of 20,000 real-world, messy photos (called LucidLR). It's like sending the AI to a gym with every possible type of equipment: rain, motion blur, compression artifacts, and low light. By training on this diverse "gym," the AI learns to handle any kind of real-world mess.

The Result

When they put all these pieces together, the AI (LucidNFT) becomes a master restorer.

It creates images that look incredibly realistic and detailed (great for Instagram or museums).
Crucially, it doesn't invent fake details. If the original photo had a broken window, it fixes the glass but keeps the broken frame. It doesn't magically "heal" the window if the original was broken.

Summary

LucidNFT is a new way to teach AI to fix old, blurry photos. It uses a Truth Detector to make sure the AI doesn't lie, a Fair Scorecard to balance creativity with honesty, and a Massive Training Gym to prepare it for real-world messiness. The result is photos that are not just sharp, but also faithful to the original memory.

1. Problem Statement

Generative Real-World Image Super-Resolution (Real-ISR) aims to reconstruct High-Resolution (HR) images from degraded Low-Resolution (LR) inputs without knowing the specific degradation process. While recent flow-matching and diffusion-based models have significantly improved perceptual quality by synthesizing high-frequency details, they suffer from a critical reliability bottleneck: Semantic and Structural Hallucination.

The Core Conflict: Generative models often produce sharp, realistic-looking images that are unfaithful to the original LR evidence (e.g., changing facial features or object structures).
The Optimization Challenge: In the absence of HR ground truth, optimizing for "faithfulness" is difficult. Furthermore, standard Reinforcement Learning (RL) approaches for Real-ISR face two specific hurdles:
1. Lack of Robust Faithfulness Signals: Existing no-reference metrics focus on perceptual quality but fail to measure consistency with the LR input, often rewarding over-sharpening or hallucinations.
2. Advantage Collapse in Multi-Reward RL: Real-ISR requires balancing multiple objectives (perceptual quality vs. structural faithfulness). Standard RL pipelines aggregate multiple rewards into a single scalar before normalization. This process compresses the contrast between different objectives, causing "advantage collapse" where the model fails to distinguish between candidates that trade off quality and faithfulness differently. This weakens the guidance signal in DiffusionNFT-style fine-tuning.
3. Data Limitations: Existing datasets are often small, paired, or rely on synthetic degradations that do not capture the complexity of real-world artifacts, limiting the diversity of RL rollouts.

2. Methodology: LucidNFT

The authors propose LucidNFT, a multi-reward RL framework designed for flow-matching Real-ISR models. It consists of three tightly coupled components:

A. LucidConsistency: Degradation-Robust Faithfulness Evaluator

To measure faithfulness without HR ground truth, the authors introduce LucidConsistency, a semantic evaluator that aligns LR and SR representations in a shared embedding space.

Architecture: It utilizes a frozen multimodal embedding backbone (Qwen3-VL-Embedding) to extract features from both the LR input and the generated SR output.
Degradation Alignment: A lightweight, trainable projection head is learned to map these embeddings into a "degradation-aligned" space. This head is trained using a symmetric InfoNCE loss on paired LR-HR data, ensuring that the semantic similarity score remains high even when the LR image is severely degraded.
Function: It provides a scalar reward score ( $C(x_{lr}, x_{sr})$ ) representing the semantic consistency between the input and output, serving as a crucial component for the RL reward signal.

B. Decoupled Multi-Reward Advantage Normalization

To solve the "advantage collapse" problem, LucidNFT introduces a novel normalization strategy for multi-objective RL.

The Issue with Baselines: Standard methods calculate a weighted sum of rewards ( $s = \sum \lambda_k r_k$ ) and then normalize the group. This allows dominant objectives (e.g., perceptual quality) to suppress others (e.g., faithfulness), collapsing the advantage distribution.
The LucidNFT Solution: The framework performs decoupled normalization:
1. Per-Objective Normalization: For each LR-conditioned rollout group, rewards for each specific objective are normalized independently (centering and scaling) before any fusion occurs.
2. Fusion: The normalized objectives are then fused via weighted summation.
3. Batch Stabilization: A final batch-level normalization is applied.
Result: This preserves the "objective-wise contrasts," ensuring that a candidate with high faithfulness but slightly lower perceptual quality is not drowned out by a candidate with high perceptual quality but low faithfulness. This maintains a strong dynamic range for the reward weights in the DiffusionNFT objective.

C. LucidLR: Large-Scale Real-World Dataset

To support robust RL fine-tuning, the authors constructed LucidLR, a dataset of 20,000 real-world degraded images.

Source: Curated from Wikimedia Commons (categories like "low quality" and "blurred images").
Filtering: Uses an NSFW classifier and manual review to ensure safety and quality.
Purpose: Unlike small, paired benchmark datasets, LucidLR provides the scale and diversity of real-world degradations (motion blur, compression, noise) necessary to generate informative stochastic rollouts for preference learning.

D. Training Framework

The system fine-tunes a pre-trained flow-matching model (e.g., LucidFlux) using DiffusionNFT (a forward-consistent RL method). The total reward is a combination of:

Perceptual Reward: UniPercept IQA (for visual quality).
Faithfulness Reward: LucidConsistency (for LR-anchored structure).
The decoupled normalization strategy is applied to these rewards before they modulate the velocity field updates.

3. Key Contributions

LucidConsistency: A novel, degradation-robust metric that quantifies LR-anchored faithfulness without HR ground truth, enabling faithfulness to be optimized directly.
Decoupled Advantage Normalization: A mathematical formulation that prevents advantage collapse in multi-reward RL by preserving objective-specific contrasts within rollout groups, crucial for balancing perceptual quality and structural faithfulness.
LucidLR Dataset: A large-scale, unpaired dataset of real-world degraded images specifically designed to fuel RL-based alignment, addressing the data scarcity in current Real-ISR research.
LucidNFT Framework: A unified system that integrates these components to achieve stable optimization and superior trade-offs in generative Real-ISR.

4. Experimental Results

The authors evaluated LucidNFT on strong flow-based baselines (LucidFlux and DiT4SR) across three benchmarks: RealLQ250, DRealSR, and RealSR.

Quantitative Performance:
- Perceptual Quality: LucidNFT consistently outperformed state-of-the-art methods (including StableSR, SinSR, DiffBIRv2, and SUPIR) on standard No-Reference IQA metrics (CLIP-IQA+, Q-Align, MUSIQ, UniPercept). For example, on RealLQ250, UniPercept improved from 70.93 (baseline) to 73.48.
- Faithfulness: The method achieved higher LucidConsistency scores compared to the baseline, indicating better structural preservation.
- Robustness: The model showed stable optimization dynamics, with reward curves for both perceptual and consistency metrics rising steadily during training, avoiding reward overfitting.
Ablation Studies:
- Removing the decoupled normalization (using scalar aggregation instead) resulted in lower perceptual gains and reduced faithfulness, confirming the necessity of the proposed normalization strategy.
- Training without the LucidLR dataset resulted in lower performance, highlighting the importance of diverse real-world degradation data.
Qualitative Results: Visual comparisons showed that LucidNFT produced images with richer texture details while maintaining structural integrity, significantly reducing hallucinated artifacts compared to the baseline.

5. Significance

This paper addresses a fundamental limitation in generative Real-ISR: the trade-off between "looking good" and "being true to the input."

Paradigm Shift: It moves the field from relying solely on perceptual metrics to explicitly optimizing for LR-anchored faithfulness using RL.
Algorithmic Innovation: The decoupled advantage normalization offers a generalizable solution for multi-objective preference optimization in generative models, solving the "advantage collapse" issue that plagues current RLHF/DPO approaches in vision.
Practical Impact: By providing a robust faithfulness signal and a large-scale dataset, LucidNFT paves the way for deploying generative super-resolution models in real-world applications where hallucination is unacceptable (e.g., medical imaging, forensic analysis, and archival restoration).