DEIG: Detail-Enhanced Instance Generation with Fine-Grained Semantic Control

The paper proposes DEIG, a novel framework that enhances multi-instance generation by integrating an Instance Detail Extractor and a Detail Fusion Module to achieve fine-grained semantic control and prevent attribute leakage, supported by a new high-quality dataset and benchmark.

Shiyan Du, Conghan Yue, Xinyu Cheng, Dongyu Zhang

Published 2026-02-23
📖 4 min read☕ Coffee break read

Imagine you are an art director for a busy movie set. You have a script that says: "In the center, a man in a red hat and blue jacket stands next to a woman in a yellow dress holding a green umbrella. Behind them, a striped dog chases a polka-dotted ball."

In the past, if you asked an AI artist to draw this, it might get the people right, but the colors would get mixed up. The man might end up with a yellow hat, the woman might have a red dress, and the dog might be solid black. The AI struggles to keep track of which details belong to which person, especially when there are many of them.

This paper introduces DEIG (Detail-Enhanced Instance Generation), a new "super-assistant" for AI artists that solves this mess. Here is how it works, broken down into simple concepts:

1. The Problem: The "Color Bleed" Effect

Think of current AI image generators like a group of painters working on a single canvas. If you tell them, "Paint a red hat here and a blue shirt there," they often get confused. The red paint might accidentally smear onto the blue shirt, or the AI might forget the hat entirely and just paint a generic person. They lack a system to say, "This detail belongs only to Person A, and this detail belongs only to Person B."

2. The Solution: DEIG's Two-Step Magic

DEIG acts like a strict project manager who organizes the painters before they even pick up a brush. It uses two main tools:

Tool A: The "Detail Extractor" (IDE)

  • The Analogy: Imagine the AI's brain is a giant library with millions of books (words). When you give a complex description like "a man in a red hat," the AI usually grabs a vague summary.
  • What DEIG does: The Instance Detail Extractor is like a super-organized librarian. It takes your long, messy sentence and breaks it down into tiny, specific "index cards." It creates a compact, high-quality summary for each person or object.
    • Card 1: "Man" + "Red Hat" + "Blue Jacket."
    • Card 2: "Woman" + "Yellow Dress" + "Green Umbrella."
    • It ensures the AI understands exactly what each card means before it starts drawing.

Tool B: The "Detail Fusion" (DFM)

  • The Analogy: Now imagine the painters are back at the canvas. Without a manager, they might shout over each other, mixing their instructions.
  • What DEIG does: The Detail Fusion Module acts like a set of invisible, magical walls. It tells the AI: "Okay, the 'Red Hat' instruction can only touch the 'Man' area. It cannot cross the invisible line to the 'Woman' area."
  • This prevents "attribute leakage" (where colors or textures spill over into the wrong object). It forces the AI to keep the details strictly contained within their own "zones."

3. The Training: Learning from a Better Teacher

To teach this new system, the authors didn't just use old, simple descriptions like "a dog." They used a smart robot (a Vision Language Model) to look at real photos and write rich, detailed stories for every single object.

  • Instead of "a car," the new training data says: "A metallic, striped red car with shiny wheels."
  • They also built a new test suite called DEIG-Bench. Think of this as a final exam where the AI has to draw complex scenes with many people and objects, all with specific, mixed-up colors and textures.

4. The Results: Why It Matters

When they tested DEIG against other AI models:

  • Old AI: Drew a scene where the man had the woman's dress, or the dog had the ball's pattern.
  • DEIG: Drew the scene exactly as described. The man kept his red hat, the woman kept her yellow dress, and the dog kept its stripes.

The Best Part: DEIG is "plug-and-play." You don't need to rebuild the entire AI artist from scratch. You can just snap this new "Project Manager" module onto existing AI tools, and suddenly, they become much better at following complex instructions.

Summary

DEIG is like giving an AI artist a pair of labeled folders and a set of dividers.

  1. Labeled Folders (IDE): It sorts your complex instructions so the AI knows exactly what to do for each specific item.
  2. Dividers (DFM): It puts up walls so the instructions for one item don't accidentally mess up the instructions for another.

The result? AI can finally draw complex, crowded scenes where every single person and object looks exactly how you described them, without the details getting mixed up.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →