RetouchIQ: MLLM Agents for Instruction-Based Image Retouching with Generalist Reward

RetouchIQ is a framework that leverages MLLM agents guided by a novel generalist reward model to perform instruction-based image retouching, overcoming the challenge of subjective evaluation in reinforcement learning to achieve superior semantic consistency and perceptual quality compared to existing systems.

Qiucheng Wu, Jing Shi, Simon Jenni, Kushal Kafle, Tianyu Wang, Shiyu Chang, Handong Zhao

Published 2026-02-20
📖 4 min read☕ Coffee break read

Imagine you have a beautiful photograph, but it feels a little "flat." You want to tell your computer, "Make this look like a dreamy, golden sunset," or "Give this a moody, cinematic blue vibe."

In the past, asking a computer to do this was like trying to teach a toddler to drive a Formula 1 car. You could give the instruction, but the computer either didn't understand the nuance, or it tried to follow a rigid rulebook (like "increase brightness by 10%") that often ruined the photo.

RETOUCHIQ is a new system that changes the game. Think of it as hiring a super-smart, artistic digital assistant who doesn't just guess, but actually thinks before they touch the sliders.

Here is how it works, broken down into simple concepts:

1. The Problem: The "One Right Answer" Trap

Imagine you ask a human editor, "Make this photo feel warmer."

  • Editor A might add a little orange.
  • Editor B might boost the reds and lower the shadows.
  • Editor C might add a golden glow.

All three are correct! They are all "warm."

Old computer systems tried to learn by comparing their work to a single "perfect" example. If the computer tried to be like Editor A but the training data said Editor B was the "right" one, the computer got confused and failed. It was like trying to learn to paint by only being allowed to copy one specific shade of blue, even though the sky can be blue in a thousand different ways.

2. The Solution: The "Art Critic" (The Generalist Reward Model)

RETOUCHIQ introduces a special component called the Generalist Reward Model. Think of this as a tough but fair Art Critic sitting next to the editor.

Instead of checking if the photo matches a specific "correct" file, the Critic looks at the photo and the user's request and asks:

  • "Does this actually feel warm?"
  • "Are the colors balanced?"
  • "Does it look like a sunset or just a mess?"

The Critic doesn't just give a score (like "7 out of 10"). It writes a little report explaining why it gave that score. "The warmth is good, but the shadows are too heavy."

3. The Learning Loop: Trial, Error, and Coaching

Here is the magic sauce of how RETOUCHIQ learns:

  1. The Editor (Policy Model): The AI tries to edit the photo based on your instruction. It makes a guess.
  2. The Critic (Reward Model): The Critic looks at the result. It generates a custom checklist of what makes a "good" edit for this specific photo and gives a score.
  3. The Coaching: If the score is low, the Critic says, "Hey, you made it too bright, and the colors look muddy." The Editor learns from this feedback and tries again.

This happens thousands of times. The Editor gets better and better at understanding not just the words, but the feeling you want.

4. The Secret Sauce: "Policy-Guided Training"

There was a tricky problem. The "Art Critic" was being trained on photos that were edited by a robot that just randomly messed with sliders (like turning the brightness up and down randomly). But the "Editor" AI was learning to make complex, artistic edits.

It was like training a race car driver by having them practice on a dirt track with potholes, but then sending them onto a real racetrack. They didn't match!

The researchers fixed this with a method called Policy-Guided Reward Training (PGRT).

  • Instead of training the Critic on random, messy edits, they trained it on the actual edits the Editor AI was making.
  • This way, the Critic learns to judge the specific, complex style of the Editor. They learn to speak the same language.

Why This Matters

Before RETOUCHIQ, if you wanted to edit a photo professionally, you needed to know what "Exposure," "Contrast," and "Temperature" meant. You had to be the expert.

With RETOUCHIQ:

  • You are the Director: You just say, "Make it feel like a cozy winter morning."
  • The AI is the Cinematographer: It figures out the technical settings to make that happen.
  • The Result: A photo that looks professional, matches your mood, and keeps the original details intact (unlike other AI tools that sometimes turn people into aliens or change the background).

In short: RETOUCHIQ is like having a professional photo editor in your pocket who listens to your vague ideas, thinks deeply about how to achieve them, and learns from a smart critic to get it right every time.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →