The Intricate Dance of Prompt Complexity, Quality, Diversity, and Consistency in T2I Models

This paper investigates how prompt complexity influences the quality, diversity, and consistency of synthetic data from text-to-image models, revealing that while increased complexity reduces diversity and consistency, it narrows the distribution shift from real data, with prompt expansion emerging as a superior intervention that enhances both image diversity and aesthetics beyond real-world benchmarks.

Zhang Xiaofeng, Aaron Courville, Michal Drozdzal, Adriana Romero-Soriano

Published 2026-02-24
📖 5 min read🧠 Deep dive

Imagine you have a magical artist named AI. This artist is incredibly talented at painting pictures based on your descriptions (prompts). You can tell them, "Draw a cat," and they will. You can also say, "Draw a fluffy orange cat sitting on a red velvet cushion in a sunlit room," and they will try to do that too.

This paper is like a giant study on how the complexity of your instructions affects the quality, variety, and accuracy of the paintings this AI artist creates. The researchers wanted to know: Does giving the AI more details make it a better artist, or does it confuse it?

Here is the breakdown of their findings using simple analogies:

1. The Three Rules of a Good Painting

The researchers judged the AI's work on three main things:

  • Quality (Beauty): Does the picture look good? Is it pretty and realistic?
  • Diversity (Variety): If you ask for "a cat" 10 times, do you get 10 different cats, or 10 identical clones?
  • Consistency (Listening): If you asked for a "blue cat," did it actually draw a blue cat, or did it ignore you and draw a red dog?

2. The "General vs. Specific" Trap

The biggest discovery was about how hard it is for the AI to understand different types of instructions.

  • The "AND" Problem (Specific Instructions): If you tell the AI, "Draw a black dog," you are asking it to combine two ideas (Black + Dog). The paper found that AI is actually pretty good at this. It's like asking a chef to add salt and pepper to a soup. It's easy to combine specific ingredients.
  • The "OR" Problem (General Instructions): If you tell the AI, "Draw a dog" (without saying what kind), the AI has to imagine any dog. The paper found this is surprisingly hard for the AI.
    • The Analogy: Imagine the AI has a library of photos of "Black Dogs" and "White Dogs." If you ask for a "Dog," the AI tries to mash those two photos together. Instead of picking one specific dog, it often creates a blurry, average-looking dog that looks like a mix of all possibilities. It struggles to pick one specific path when the path isn't clearly defined.

Key Takeaway: It is harder for the AI to generalize (be vague) than to be specific. When you give it a vague prompt, it tends to produce "average" or "boring" results.

3. The "Detail Dilemma"

The researchers tested what happens when you keep adding more and more details to the prompt.

  • The Sweet Spot: When you give a prompt with just a few details (e.g., "A cat on a mat"), the AI creates beautiful, varied, and accurate images.
  • The Overload: When you give a very long, complex prompt (e.g., "A cat on a mat, wearing a tiny hat, with a red collar, looking left, in a Victorian room with a chandelier..."), two things happen:
    1. Variety Drops: The AI gets so focused on following every single rule that it stops being creative. It stops making different kinds of cats and just makes the exact same cat over and over.
    2. Listening Drops: The AI starts to forget parts of your long list. It might draw the hat but forget the red collar.

4. The "Magic Expander" (Prompt Expansion)

The researchers found a clever trick to fix the "boring" problem. They used a second AI (a language model) to act as a creative assistant.

  • How it works: You tell the assistant, "The user wants a 'dog'." The assistant thinks, "Okay, let's give the artist more ideas!" and expands that into "A golden retriever playing fetch," "A poodle in a park," "A husky in the snow," etc.
  • The Result: By feeding these expanded, specific ideas to the image AI, they got much more variety (diversity) and better-looking pictures (quality) than if they had just asked for "a dog" directly.
  • The Catch: Sometimes, this "Magic Expander" gets too creative. It might add details the user didn't want, making the picture less faithful to the original simple request.

5. The "Newer Models" Paradox

The paper looked at older AI models vs. newer, fancier ones.

  • Newer Models: They make incredibly beautiful, high-definition pictures (High Quality). However, they are sometimes too obedient. If you ask for a "dog," they might only draw a Golden Retriever because that's what they think is the "perfect" dog. They have lost some of their wild variety.
  • Older Models: They were a bit messier and less pretty, but they were more willing to try weird, different types of dogs.

The Big Conclusion

The paper suggests that Prompt Complexity is a dial you need to tune carefully.

  • If you want variety, you need to be specific (or use the "Magic Expander" to give the AI specific ideas).
  • If you want the AI to be creative, you can't just give it a vague command like "draw something cool." You have to guide it with enough detail to stop it from getting confused, but not so much detail that it gets stuck in a rut.

In short: The AI artist is amazing, but it needs clear instructions to be its best. If you are too vague, it gets confused and averages everything out. If you are too specific, it gets rigid. The secret sauce is finding the right balance, or letting a "creative assistant" help you write the perfect instructions.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →