Fourier-RWKV: A Multi-State Perception Network for Efficient Image Dehazing

Fourier-RWKV is a novel image dehazing framework that achieves state-of-the-art performance with linear computational complexity by synergistically integrating spatial, frequency-domain, and semantic-relation perception mechanisms to effectively model non-uniform haze while enabling real-time deployment.

Lirong Zheng, Yanshan Li, Rui Yu, Kaihao Zhang

Published 2026-02-17
📖 5 min read🧠 Deep dive

Imagine you are trying to take a beautiful photo of a city skyline, but a thick, uneven fog has rolled in. Some parts of the fog are dense and gray, while other parts are thin and wispy. Your goal is to "dehaze" the image—to digitally remove the fog and reveal the crisp, clear city underneath.

This is the challenge of Image Dehazing. For a long time, computers struggled with this because fog isn't just a uniform blanket; it's messy, uneven, and changes from spot to spot.

The paper you provided introduces a new AI model called Fourier-RWKV. Think of it as a "super-smart photo editor" that doesn't just guess how to clean the image but understands the physics of fog and the structure of the picture simultaneously.

Here is how it works, explained through simple analogies:

1. The Problem with Old Methods

  • The "Blind Painter" (CNNs): Early AI models were like painters who only looked at a tiny dot on the canvas at a time. They could fix a small smudge, but they couldn't see the whole picture to understand how the fog connected across the entire image.
  • The "Overworked Librarian" (Transformers): Newer models (Transformers) are like librarians who read every single book in the library to find a connection. They are great at seeing the big picture, but if the library is huge (a high-resolution photo), they get overwhelmed and take forever to finish. They are too slow for real-time use.

2. The Solution: A "Multi-State" Detective

The authors created Fourier-RWKV, which acts like a detective with three different "super-senses" working together. Instead of just looking at the photo, it looks at it in three different ways at once.

Sense 1: The "Shape-Shifter" (Spatial-Form Perception)

  • The Analogy: Imagine trying to clean a window with a rag. If the dirt is in a straight line, you wipe straight. If the dirt is in a weird, jagged shape, you have to twist your wrist and move the rag in a specific way to hit every spot.
  • How it works: Old AI models used a "rigid" wipe (a fixed pattern). This new model uses DQ-Shift, a "shape-shifting" tool. It looks at the fog and instantly changes its shape to fit the uneven patches of haze, ensuring it cleans every nook and cranny without missing anything.

Sense 2: The "Music Conductor" (Frequency-Domain Perception)

  • The Analogy: Imagine a song. The fog is like a low, rumbling bass note that drowns out the melody. The actual image details (buildings, trees) are the high-pitched instruments.
  • How it works: Most AI looks at the photo as a grid of pixels (like looking at a painting). This model uses Fourier Mix to listen to the "music" of the image. It separates the "bass" (the fog) from the "melody" (the clear image). Because fog mostly lives in the low-frequency "bass" notes, the model can easily identify and remove it while keeping the high-frequency details sharp. This allows it to see the "whole song" (global context) instantly without getting tired.

Sense 3: The "Translator" (Semantic-Relation Perception)

  • The Analogy: Imagine a construction crew. The "Encoder" team is digging the foundation, and the "Decoder" team is building the roof. If they don't talk to each other, the roof might not fit the foundation, leading to a wobbly house.
  • How it works: In many AI models, the "digging" team and the "building" team get out of sync, causing blurry spots or weird artifacts. This model uses a Semantic Bridge Module (SBM). It acts as a translator, constantly checking in with both teams to make sure they are speaking the same language. It ensures that the details the model is trying to restore match perfectly with the original structure of the image.

3. Why is this a Big Deal?

  • Speed vs. Quality: Usually, you have to choose between a model that is fast but blurry, or one that is slow but perfect. Fourier-RWKV is like a Formula 1 car that drives on dirt roads. It is incredibly fast (linear complexity, meaning it doesn't slow down as the photo gets bigger) but still produces museum-quality results.
  • Real-World Ready: It works amazingly well on real-world photos where the fog is uneven and messy, not just on perfect computer-generated test images.

Summary

Fourier-RWKV is a new way for computers to clear up foggy images. Instead of just squinting at pixels, it:

  1. Adapts its shape to clean uneven fog (Shape-Shifter).
  2. Listens to the frequency of the image to separate fog from details (Music Conductor).
  3. Translates instructions between different parts of the AI to keep the image consistent (Translator).

The result is a tool that is fast enough to run on a phone but smart enough to restore a photo that looks like it was taken on a crystal-clear day, even if it was originally taken in a thick storm.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →