Adaptive Dynamic Dehazing via Instruction-Driven and Task-Feedback Closed-Loop Optimization for Diverse Downstream Task Adaptation

This paper proposes a novel adaptive dynamic dehazing framework that utilizes a closed-loop optimization mechanism combining task performance feedback and text-based instruction guidance to enable real-time, training-free adaptation of dehazing outputs for diverse downstream vision tasks.

Yafei Zhang, Shuaitian Song, Huafeng Li, Shujuan Wang, Yu Liu

Published 2026-03-09
📖 4 min read☕ Coffee break read

Imagine you are a photographer trying to take a picture on a very foggy day. In the past, photo editors (or "dehazing" software) had one goal: make the picture look as clear and pretty as possible for a human eye. They would wipe away the fog until the image looked sharp.

The Problem:
Here's the catch: A photo that looks beautiful to a human isn't always the best photo for a robot.

  • If a self-driving car needs to see a pedestrian, it doesn't care about the "pretty colors"; it needs the edges of the person to be super sharp so it doesn't hit them.
  • If a security camera is trying to count people, it needs the background to be distinct, not just "clear."
  • If a robot is trying to guess how far away a tree is, it needs specific depth details that a pretty photo might actually blur out.

Old software was like a rigid chef who cooked the same perfect meal for everyone. If you asked for a spicy dish, they couldn't change it without cooking a whole new meal from scratch (retraining the model).

The Solution: The "Smart, Adaptable Chef"
This paper introduces a new system called ADeT-Net. Think of it as a smart, adaptable chef who can change their cooking style instantly based on who is eating and what they want, without needing to go back to culinary school.

Here is how it works, using two main "superpowers":

1. The "Feedback Loop" (The Taste-Tester)

Imagine the chef cooks a dish (removes the fog) and serves it to a "taste-tester" (the downstream task, like the self-driving car).

  • If the car says, "I can't see the stop sign clearly," the chef doesn't just ignore it.
  • The chef immediately tweaks the recipe while cooking. Maybe they add more contrast to the edges or sharpen the colors of the sign.
  • This happens in real-time. The system learns from the robot's reaction and adjusts the image instantly to make the robot happy.

2. The "Instruction Manual" (The Customer's Order)

Sometimes, the robot doesn't know exactly what it needs, or a human operator wants to give a specific command.

  • You can type a note to the chef: "Hey, I need this image to be great for finding lost dogs," or "Make it perfect for measuring distances."
  • The system reads this text (using a language understanding tool called BERT) and adjusts the image to match that specific request. It's like the chef reading a special order slip and knowing exactly how to plate the food.

How They Work Together (The Closed Loop)

The magic happens because these two powers talk to each other in a closed loop:

  1. The Chef makes an initial clear image.
  2. The Taste-Tester (the robot) tries to do its job (detect a car, segment a road).
  3. The Customer (the text instruction) says what they want.
  4. The system combines the Taste-Tester's feedback ("This edge is too blurry for me") with the Customer's order ("Focus on depth") to tweak the image one last time before it's final.

Why This is a Big Deal

  • No Re-training: Old methods required you to build a new, separate software for every single task (one for cars, one for security, one for drones). This new method is a "Swiss Army Knife." You use the same tool for everything, and it just adapts on the fly.
  • Real-Time: It doesn't wait for a computer to process a new model; it adjusts the image as it is being processed.
  • Better Results: In their tests, this method helped robots see better, detect objects more accurately, and measure distances more precisely than any previous method, all while making the images look great to humans too.

In a nutshell:
This paper gives foggy-image software a brain and a voice. Instead of just "cleaning" a picture, it asks, "Who is going to use this picture, and what do they need?" and then instantly reshapes the image to be the perfect tool for that specific job.