FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance

FlashMotion is a novel training framework that enables high-quality, few-step trajectory-controllable video generation by combining a pre-trained trajectory adapter with a hybrid diffusion-adversarial finetuning strategy, while introducing the FlashBench benchmark to evaluate performance across varying object counts.

Quanhao Li, Zhen Xing, Rui Wang, Haidong Cao, Qi Dai, Daoguo Dong, Zuxuan Wu

Published 2026-03-13
📖 4 min read☕ Coffee break read

Imagine you want to create a movie where a specific character, like a hamster in a hat, drives a bulldozer across the screen following a very specific path you drew on a map.

In the world of AI video generation, there are two big problems with doing this right now:

  1. It's too slow: The current "smart" AI models that can follow your map perfectly take a long time to think. They are like a master chef who tastes the soup 50 times before serving it. It's delicious, but you have to wait forever.
  2. It's blurry when you speed it up: If you tell that master chef to "hurry up and only taste it 4 times," the soup comes out watery and blurry. The AI loses its ability to follow your map accurately.

FlashMotion is the new solution that lets you get a gourmet meal in seconds without losing the flavor. Here is how it works, using a simple analogy.

The Three-Stage Recipe

The researchers built FlashMotion using a three-step cooking process:

1. The Master Chef (Training the "Slow Adapter")

First, they take a very slow, high-quality AI model (the "Slow Generator") and teach it a special skill: how to follow a drawn path perfectly. They train a small "helper module" (called an Adapter) to act like a GPS for the AI.

  • Analogy: Imagine training a very slow, careful driver to follow a specific route on a map perfectly. This driver is great, but they drive at 10 mph.

2. The Speedster (Distilling the "Fast Generator")

Next, they take that slow, careful driver and teach a new, super-fast driver (the "Fast Generator") how to drive the same way, but in record time. They use a technique called distillation, which is like compressing a whole library of driving lessons into a single, quick cheat sheet.

  • Analogy: Now you have a race car driver who can do the same route in 10 seconds instead of 10 minutes. But here's the catch: if you give the old GPS (from step 1) to this new race car, the GPS gets confused. The race car moves too fast for the old instructions, and the car ends up crashing or driving in circles.

3. The Hybrid Coach (The Magic Step)

This is the most important part. The researchers realized they couldn't just use the old GPS with the new race car. They had to retrain the GPS specifically for the race car.

  • The Problem: If they just told the GPS to "try harder" (using standard math), the race car would still produce blurry, weird videos because the GPS was trying to force the car to move like the slow driver.
  • The Solution: They created a Hybrid Coach. This coach uses two tools:
    1. The Pixel Teacher: Checks if the car is on the right path (the map).
    2. The Art Critic (Discriminator): A smart judge that looks at the video and says, "This looks fake and blurry. Make it look real!"
  • Analogy: It's like having a coach who tells the race car, "You need to follow the map, but you also need to make sure the scenery looks crisp and real, not blurry." By balancing these two instructions, the race car learns to drive fast and look perfect.

The Result: FlashMotion

The final result is a system that:

  • Generates videos in 4 steps (instead of 50), making it 47 times faster.
  • Follows your drawn path perfectly, even with complex movements.
  • Looks high-quality, without the blurry artifacts you usually get when speeding up AI.

The New Rulebook: FlashBench

To prove this works, the authors also created a new test called FlashBench.

  • Analogy: Before, people tested video AI on short, 5-second clips. It was like testing a race car on a parking lot. FlashMotion can drive for much longer (up to 121 frames), so they built a new "racetrack" (FlashBench) with long, complex courses and many different objects to ensure the car doesn't get lost or crash over time.

Why This Matters

Before FlashMotion, you had to choose between Quality (slow, perfect video) or Speed (fast, blurry video). FlashMotion breaks that trade-off. It allows creators to make high-quality, motion-controlled videos almost instantly, opening the door for real-time video editing, interactive games, and instant storytelling.

In short: FlashMotion is the "Turbo Mode" for AI video that doesn't sacrifice the picture quality or the ability to follow your instructions.