Momentum Guidance: Plug-and-Play Guidance for Flow Models

The paper introduces Momentum Guidance, a plug-and-play technique that enhances flow-based generative models by extrapolating ODE velocities via exponential moving averages to improve sample quality and detail without increasing inference costs, achieving significant FID improvements on benchmarks like ImageNet-256 and large-scale models such as Stable Diffusion 3.

Runlong Liao, Jian Yu, Baiyu Su, Chi Zhang, Lizhang Chen, Qiang Liu

Published 2026-02-25
📖 4 min read☕ Coffee break read

Imagine you are trying to teach a robot to paint a masterpiece based on a simple description, like "a cat sitting on a fence."

The Problem: The "Blurry Dream" Robot

Right now, the best AI painters (called Flow Models) are incredibly talented, but they have a weird habit. When you ask them to paint, they often produce images that look like a dream you had after eating too much cheese.

The colors are there, the shapes are roughly right, but everything is soft, blurry, and lacks detail. The cat's fur looks like a fuzzy cloud, and the fence posts are melting into the background.

Why does this happen? Because the AI was trained to be "safe." It learned to predict the average of all possible cats and fences. In math terms, it smoothed out all the sharp edges and high-frequency details to avoid making mistakes. The result? A safe, but boring, blurry image.

The Old Fix: The "Double-Check" Method

To fix the blur, artists developed a technique called Classifier-Free Guidance (CFG). Think of this like asking the robot to paint the picture twice:

  1. First pass: "Paint a cat." (The blurry version).
  2. Second pass: "Paint a cat without any specific instructions." (A super-blurry, generic version).

Then, the computer takes the first version and pushes it away from the second version. It's like saying, "Okay, the generic cat is too fuzzy, so let's make the specific cat even sharper by comparing it to the fuzzy one."

The Catch: This works great, but it's twice as slow. The robot has to do double the work for every single step of the painting process. If you want a high-quality image, you have to wait twice as long.

The New Solution: "Momentum Guidance" (The Skateboarder)

This paper introduces a new trick called Momentum Guidance (MG). It's like giving the robot a skateboard instead of making it walk.

Here is the analogy:
Imagine the robot is a skateboarder trying to ride down a hill to reach the "perfect image" at the bottom.

  • The Old Way (CFG): The skateboarder stops at every single step to ask a friend, "Is this the right direction?" and then asks another friend, "What would a generic path look like?" Then they compare notes. It's accurate, but it takes forever.
  • The New Way (MG): The skateboarder looks at their recent history.
    • "I was moving slowly and smoothly a moment ago (the blurry past)."
    • "I am moving faster and more sharply right now."
    • "Let's use that difference to push me even harder in the right direction!"

How it works simply:
The AI remembers the "velocity" (the direction and speed) of its previous steps. It calculates an average of where it has been (which is usually smooth and blurry) and then pushes the current step away from that average.

It's like a skier who remembers the smooth, wide turns they made at the top of the mountain. As they get closer to the bottom (the final image), they remember those wide turns and deliberately steer sharper to carve out the details.

Why is this a Big Deal?

  1. It's Free (in terms of time): The robot doesn't need to do extra work. It just uses the information it's already calculating as it paints. It's like getting a bonus feature without paying extra.
  2. It's Sharper: The images come out with crisp details—individual hairs on the cat, clear reflections on a car, sharp edges on buildings.
  3. It Works with the Old Way: You can use this new "skateboard" trick on top of the old "double-check" method to get even better results, or use it alone to save time.

The Result

The researchers tested this on famous AI models (like Stable Diffusion 3 and FLUX).

  • Before: A blurry, dream-like image.
  • After: A crisp, high-definition photo where you can see the texture of the wood on the fence and the whiskers on the cat.

In a nutshell: Momentum Guidance is a clever way to tell the AI, "Don't just follow the smooth, safe path. Remember where you've been, and use that memory to push yourself toward the sharp, exciting details." It makes AI art faster, sharper, and more detailed without needing more computer power.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →