Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

This paper proposes a novel post-training quantization method for diffusion models that optimizes calibration sample weights to align gradients across timesteps, thereby overcoming the sub-optimality of uniform weighting and significantly improving quantization performance.

Dung Anh Hoang, Cuong Pham anh Trung Le, Jianfei Cai, Thanh-Toan Do

Published 2026-03-03
📖 5 min read🧠 Deep dive

The Big Picture: Making AI Art Faster and Smaller

Imagine a Diffusion Model (like DALL-E or Stable Diffusion) as a master sculptor. To create a beautiful statue (an image) from a block of rough stone (random noise), the sculptor doesn't just smash it instantly. They take hundreds of tiny, careful steps, chipping away a little bit of noise at a time until the image emerges.

The Problem:
This process is incredibly slow and requires a massive amount of computer power (memory). It's like trying to run this sculptor on a tiny, old laptop—it just can't handle the weight or the speed.

The Solution (Quantization):
To fix this, engineers use a technique called Quantization. Think of this as compressing the sculptor's tools. Instead of using a full set of heavy, precision steel chisels (high-precision numbers), they switch to a lighter, plastic toolkit (low-precision numbers). This makes the sculptor faster and lighter, but there's a risk: the plastic tools might not be precise enough, and the statue could end up looking blurry or weird.

The Specific Problem: The "One-Size-Fits-All" Mistake

Existing methods for compressing these models treat every step of the sculpting process the same way. They assume that chipping away noise at the very beginning (when the stone is just a blob) is just as important as chipping away noise at the very end (when the statue's face is being defined).

The Paper's Insight:
The authors realized this is wrong.

  • Early steps are about big shapes and general vibes.
  • Late steps are about fine details and sharp edges.

If you try to tune your plastic tools to be perfect for both the rough shaping and the fine detailing at the same time, you end up with a tool that is mediocre at both. It's like trying to use a single pair of scissors to cut through a thick log and then trying to cut a delicate piece of paper with the same scissors. You'll either crush the paper or get stuck on the log.

Furthermore, the "instructions" (gradients) the model gives you at the start of the process often conflict with the instructions at the end. If you try to follow both sets of conflicting instructions equally, the model gets confused and performs poorly.

The Solution: The "Smart Coach" (Gradient-Aligned Calibration)

The authors propose a new method called Gradient-Aligned Calibration. Here is how it works, using a sports analogy:

Imagine a coach training a runner for a marathon. The race has three phases:

  1. The Start: Fast sprinting.
  2. The Middle: Steady pacing.
  3. The End: A final sprint.

Old Method: The coach gives the runner the exact same training plan for all three phases. "Run fast, then run fast, then run fast." This doesn't work well because the body needs different strategies for each phase.

The New Method (Gradient-Aligned Calibration):
The coach introduces a Smart Weight System.

  • They look at the runner's performance at every single moment of the race.
  • They realize that the runner's body signals (gradients) for the "Start" phase are different from the "End" phase.
  • Instead of treating every training sample equally, the system learns to assign importance weights.
    • If a training sample helps the runner improve their start without ruining their end, it gets a high weight (lots of attention).
    • If a sample helps the start but confuses the end (causing a "gradient conflict"), it gets a low weight (ignored).

By doing this, the coach (the algorithm) finds a "sweet spot" where the training data from all different phases of the race work together harmoniously, rather than fighting against each other.

How They Did It (The "Meta-Learning" Trick)

The paper describes a mathematical trick called Meta-Learning.

  • Think of it as a "Teacher of Teachers."
  • The main teacher (the quantization algorithm) tries to compress the model.
  • The "Meta-Teacher" watches the main teacher and asks: "Wait, if you focus too much on this specific noisy image, the model will get confused later. Let's lower the importance of that image and boost the importance of this other one."
  • The system automatically adjusts these importance scores until the model learns to compress well without losing its ability to generate high-quality images.

The Results: Why It Matters

The authors tested this on famous image datasets (like CIFAR-10, LSUN, and ImageNet).

  • The Outcome: Their method produced clearer, sharper images than any previous method, even when the model was compressed heavily.
  • The Trade-off: It took a little bit more time to train the compression settings (like spending an extra hour tuning the coach's plan), but once the plan was set, the actual running (generating images) was just as fast and light as before.

Summary in One Sentence

This paper teaches AI image generators how to "compress" themselves without losing quality by realizing that different stages of image creation need different attention, and it uses a smart system to weigh the most helpful training examples while ignoring the ones that cause confusion.