DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers

The paper proposes DDiT, a dynamic patch scheduling method that adaptively adjusts patch sizes based on content complexity and denoising timesteps to significantly accelerate Diffusion Transformers while maintaining high generation quality.

Dahye Kim, Deepti Ghadiyaram, Raghudeep Gadde

Published 2026-02-20
📖 4 min read☕ Coffee break read

Imagine you are an artist hired to paint a masterpiece based on a customer's description.

The Old Way (Standard Diffusion Models):
Currently, when an AI like FLUX or Wan creates an image or video, it works like a painter who is forced to use the exact same brush size for the entire painting, from the first sketch to the final details.

  • The Problem: If the customer asks for a "blue sky," the artist wastes time using a tiny, fine-tipped brush to paint a smooth, empty sky. It's like using a scalpel to paint a wall.
  • The Consequence: This makes the process incredibly slow and expensive, even for simple requests. If the customer asks for a "crowded zoo with zebras," the artist finally uses the tiny brush where it's needed, but they've already wasted hours on the sky.

The New Way (DDiT - Dynamic Patch Scheduling):
The paper introduces DDiT, a smart system that acts like a chameleon artist who can instantly swap brushes depending on what part of the painting they are working on.

Here is how it works, broken down into simple concepts:

1. The "Brush Size" Analogy (Patch Scheduling)

In AI terms, the image is broken into small squares called "patches" (or tokens).

  • Large Patches (Coarse Brush): Used for big, simple areas (like a blue sky or a plain wall). The AI sees the "big picture" without needing to look at every single pixel. This is fast.
  • Small Patches (Fine Brush): Used for complex areas (like a zebra's stripes, a face, or a tree with leaves). The AI zooms in to capture the tiny details. This is slow but necessary for quality.

DDiT's Superpower: Instead of sticking to one brush size, DDiT looks at the painting at every single step and asks: "Do I need to zoom in right now, or can I zoom out?"

  • Early steps: It uses large patches to quickly sketch the general shape and layout (the "skeleton" of the image).
  • Later steps: As the image gets clearer, it switches to small patches only where the details are getting complicated.

2. The "Traffic Sensor" (How it decides)

How does the AI know when to switch brushes? It doesn't need a human to tell it. It uses a clever trick called Latent Evolution.

Imagine the AI is driving a car through a foggy landscape.

  • Smooth Road (Simple Content): If the scenery outside the window isn't changing much (e.g., just a plain wall), the car can drive fast (use large patches).
  • Rough Road (Complex Content): If the scenery is changing rapidly (e.g., a flock of birds suddenly appearing), the car must slow down and pay close attention (switch to small patches).

DDiT measures how "fast" the image is changing at every moment. If the changes are slow, it speeds up. If the changes are chaotic and detailed, it slows down to ensure quality.

3. The Result: Speed Without Sacrifice

The paper tested this on two famous AI models:

  • FLUX-1.Dev (for Images): It made the AI 3.5 times faster.
  • Wan 2.1 (for Videos): It made the video generator 3.2 times faster.

The Best Part: Even though it's much faster, the pictures look just as good as the slow version.

  • Analogy: It's like a delivery truck that usually drives 20mph everywhere. DDiT is a smart truck that drives 60mph on the highway (simple parts) but slows to 10mph only when navigating a crowded city street (complex parts). The package arrives just as safely, but much sooner.

Summary

DDiT is a "smart scheduler" for AI art generators. It stops the AI from wasting time doing detailed work on simple things and focuses its energy only where it's needed. It's like giving the AI a pair of smart glasses that tell it exactly how much detail to look for at every single moment, resulting in faster generation times without losing the "wow" factor.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →