FireRed-Image-Edit-1.0 Technical Report

FireRed-Image-Edit-1.0 is a state-of-the-art diffusion transformer for instruction-based image editing that achieves superior performance through a massive 1.6B-sample curated dataset, a multi-stage training pipeline featuring novel optimization techniques, and a comprehensive new benchmark suite.

Super Intelligence Team, Changhao Qiao, Chao Hui, Chen Li, Cunzheng Wang, Dejia Song, Jiale Zhang, Jing Li, Qiang Xiang, Runqi Wang, Shuang Sun, Wei Zhu, Xu Tang, Yao Hu, Yibo Chen, Yuhao Huang, Yuxuan Duan, Zhiyi Chen, Ziyuan Guo

Published 2026-02-23
📖 5 min read🧠 Deep dive

Imagine you have a magical photo editor that doesn't just use filters, but actually understands your requests like a human artist. You can say, "Make the dog wear a tuxedo and turn the background into a snowy mountain," and it does exactly that without messing up the rest of the picture.

This paper introduces FireRed-Image-Edit, a new AI model designed to be the ultimate "digital art assistant." The team behind it (from Xiaohongshu) didn't just throw more computing power at the problem; they built a smarter, more efficient system.

Here is the story of how they built it, explained in everyday terms:

1. The Ingredients: Cooking a 100-Million-Recipe Feast

To teach an AI how to edit photos, you need a massive library of examples.

  • The Problem: Most AI models are trained on messy data, like a library where books are torn, pages are missing, or the stories don't make sense.
  • The FireRed Solution: They gathered 1.6 billion raw image examples (like buying a massive warehouse of ingredients). But instead of cooking with everything, they acted like a super-chef.
    • They washed the ingredients (removed duplicates and bad photos).
    • They chopped them perfectly (labeled them with precise instructions).
    • They selected only the top 100 million high-quality recipes.
    • The Secret Sauce: They made sure the library was balanced. They didn't just teach the AI how to create pictures from scratch; they taught it how to change existing pictures, ensuring it knows when to add, remove, or swap things without ruining the original vibe.

2. The Classroom: Teaching the AI to Listen

Once they had the data, they had to teach the model. They used a three-step training process, like a student going from elementary school to a PhD.

  • Step 1: Pre-training (The General Knowledge Phase): The AI reads millions of books and looks at millions of photos to understand the world. It learns what a "cat" looks like, what "sunset" means, and how light works.
  • Step 2: Fine-Tuning (The Specialized Internship): Now, the AI learns specific tasks. It practices following instructions like "make the sky blue" or "remove the trash can." They used a clever trick called "Stochastic Instruction Alignment."
    • Analogy: Imagine a teacher giving a student a list of ingredients: "Flour, Eggs, Sugar." Then, the teacher shuffles the list and says, "Sugar, Flour, Eggs." The student must still bake the same cake. This forces the AI to understand the meaning of the words, not just their order.
  • Step 3: Reinforcement Learning (The Critic's Review): This is where the AI gets a "taste test." They show the AI two versions of an edited photo: one good, one bad. The AI learns to prefer the good one.
    • The "Anti-Hack" Trick: Sometimes, AI tries to cheat. For example, if asked to write text on a sign, it might write giant, blurry letters just to trick the system into thinking it did the job. The team invented a "Layout-Aware Reward" to catch this. It's like a strict editor who checks not just what is written, but where it is written and if it fits the picture.

3. The Safety Net: Keeping the Face (and Identity) Intact

One of the hardest things in photo editing is changing a person's clothes without changing their face.

  • The Problem: Old AI models often turned a specific person into a generic "person" or gave them a weird, plastic look.
  • The FireRed Solution: They added a "Consistency Loss" (a safety net).
    • Analogy: Imagine you are painting a portrait. You are allowed to change the hat and the jacket, but you must keep the face exactly the same. The AI has a "security guard" that constantly checks: "Is this still the same person?" If the AI starts drifting, the guard pulls it back. This ensures that when you change a model's outfit, it's still that model.

4. The Exam: REDEdit-Bench

How do you know if the AI is actually good? You can't just ask, "Did you do it?" You need a real test.

  • The team built a new exam called REDEdit-Bench.
  • It has 1,673 real-world challenges, from "make this old photo look new" to "put this specific text on this poster."
  • It tests the AI on things that matter to real people: Did it follow the instructions? Did it keep the background safe? Does it look realistic?
  • The Result: FireRed-Image-Edit scored higher than almost every other open-source model and even beat some expensive, closed-source commercial giants.

5. Why This Matters

Usually, to get a better AI, companies just build bigger, heavier models that cost millions of dollars to run.

  • FireRed's Philosophy: Instead of building a bigger truck, they built a smarter engine. By cleaning the data better and teaching the model more efficiently, they achieved top-tier results without needing a supercomputer the size of a house.

In a nutshell: FireRed-Image-Edit is a highly trained digital artist that has been fed a massive, perfectly organized library of examples, taught to listen carefully to instructions, and trained to never lose the identity of the subject. It's a tool that brings professional-grade photo editing to everyone, powered by smart engineering rather than just brute force.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →