RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers

The paper proposes RelaCtrl, a relevance-guided framework that optimizes control signal integration in Diffusion Transformers by dynamically tailoring layer configurations and introducing a Two-Dimensional Shuffle Mixer, achieving superior performance with only 15% of the parameters and computational complexity compared to PixArt-delta.

Ke Cao, Jing Wang, Ao Ma, Jiasong Feng, Xuanhua He, Run Ling, Haowei Liu, Jian Lu, Wei Feng, Haozhe Wang, Hongjuan Pei, Yihua Shao, Zhanjie Zhang, Jie Zhang

Published 2026-02-27
📖 4 min read☕ Coffee break read

Imagine you are a master chef (the Diffusion Transformer) trying to cook a perfect meal based on a customer's order (the text description). Sometimes, the customer gives you extra instructions: "Make it spicy," "Use only organic vegetables," or "Arrange it like a flower." These extra instructions are your control signals.

In the past, to follow these extra instructions, chefs would hire a whole new team of sous-chefs just to double-check every single step of the cooking process, from chopping onions to plating the dessert. This was effective, but it was expensive, slow, and wasted a lot of energy. They were checking every step, even the ones where the customer's extra instructions didn't really matter.

This paper, RelaCtrl, introduces a smarter way to cook. It's like hiring a smart, efficient sous-chef who knows exactly when and how to help, without wasting time on things that don't need attention.

Here is how it works, broken down into three simple ideas:

1. The "Relevance Score": Knowing When to Speak Up

The researchers discovered that not all steps in the cooking process are equally important for following the customer's extra rules.

  • The Old Way: The sous-chef shouted instructions at the beginning, the middle, and the end of the cooking process, even if the chef was already doing the right thing.
  • The RelaCtrl Way: They ran a test to see when the extra instructions mattered most. They found that the instructions were most critical during the middle stages of cooking (like seasoning the sauce). At the very beginning (chopping) and the very end (plating), the instructions mattered less.
  • The Result: Instead of shouting at every step, the smart sous-chef only speaks up at the 11 most critical moments. This saves a huge amount of energy and time, yet the meal still turns out perfect.

2. The "Two-Dimensional Shuffle Mixer" (TDSM): The Efficient Helper

Even when the sous-chef does speak up, the old method was clumsy. It used a giant, heavy tool to mix ingredients, which took up a lot of space in the kitchen.

  • The Old Tool: A massive, slow mixer that tried to stir every single ingredient with every other ingredient at once.
  • The New Tool (TDSM): The researchers built a lightweight, magical shaker.
    • Imagine you have a deck of cards (the ingredients). Instead of looking at the whole deck, you randomly pick a few cards, shuffle them around, mix them, and then put them back in their original order.
    • Because you shuffled them randomly, the cards that were far apart in the deck can now "talk" to each other. This allows the sous-chef to understand the big picture without needing a giant, heavy machine.
    • This new tool does the same job as the giant mixer but is much smaller and faster.

3. The "Smart Placement": Putting the Right Tools in the Right Spots

Finally, the paper explains that the strength of the sous-chef's help should change depending on the moment.

  • In the most critical moments (the middle of cooking), the sous-chef uses a strong, detailed plan (more computing power) to ensure the dish is perfect.
  • In the less critical moments, the sous-chef uses a simpler, lighter plan.
  • This ensures that no energy is wasted on steps that don't need it, and no energy is saved on steps that do.

The Big Picture: Why This Matters

Before this paper, adding "control" to AI image generators was like adding a heavy backpack to a runner. It made the runner slower and tired them out quickly.

RelaCtrl is like giving that runner a lightweight, aerodynamic suit.

  • It's faster: It uses about 85% less extra computing power than previous methods.
  • It's cheaper: It needs far fewer "parameters" (which you can think of as the size of the brain needed to do the job).
  • It's just as good: The images it creates are just as high-quality and follow the instructions just as well as the heavy, slow methods.

In short: RelaCtrl teaches AI how to be a smart, efficient worker that knows exactly when to pay attention and how to do the job with the least amount of effort possible, without sacrificing the quality of the final result.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →