LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation

LinVideo is a data-free post-training framework that achieves O(n) attention complexity in video generation by automatically selecting layers for linear attention conversion via a binary classification approach and employing an anytime distribution matching objective, resulting in significant speedups while preserving generation quality.

Yushi Huang, Xingtong Ge, Ruihao Gong, Chengtao Lv, Jun Zhang

Published 2026-02-24
📖 5 min read🧠 Deep dive

Imagine you have a brilliant, world-class chef (the Video Diffusion Model) who can cook up incredibly realistic, high-definition movies from a simple description like "a dragon flying over a castle." This chef is amazing, but there's a catch: they are incredibly slow and expensive to run.

Why? Because to cook a 10-second video, the chef has to look at every single frame and compare it to every other frame to make sure the dragon's wings move smoothly and the clouds don't flicker. If the video is long, this "comparing everything to everything" task grows explosively large. It's like trying to introduce every single guest at a massive wedding to every other guest; the number of handshakes becomes impossible to manage. In tech terms, this is called quadratic complexity (O(n2)O(n^2)).

The paper introduces LINVIDEO, a new "kitchen renovation" that makes this chef faster without hiring a new one or buying new ingredients. Here is how they did it, explained simply:

1. The Problem: The "Slow Chef" vs. The "Fast Assistant"

Scientists already knew about a "Fast Assistant" (called Linear Attention) who can cook much faster (O(n)O(n)) by using a shortcut. Instead of introducing every guest to everyone, the assistant just remembers the general vibe of the room.

However, there's a big problem: If you just swap the Master Chef for the Fast Assistant, the food tastes terrible. The video becomes blurry, the motion is jerky, and the dragon looks like a blob. Usually, to fix this, you'd have to send the Fast Assistant back to culinary school for years (called pre-training) to learn the Master Chef's secrets. That takes too much time and money.

The Question: Can we teach the Master Chef to use the Fast Assistant's shortcuts right now, without sending them back to school?

2. The Solution: LINVIDEO (The Smart Renovation)

The authors created a framework called LINVIDEO. Think of it as a smart renovation crew that upgrades the kitchen while the chef is still working, without needing a new recipe book (data-free).

They used two clever tricks:

Trick A: The "Selective Transfer" (Don't Fire Everyone)

The team realized that not all parts of the chef's brain are equally important.

  • The Analogy: Imagine the chef has 30 different stations (layers). Some stations handle the basic chopping (early layers), while others handle the complex plating and final garnish (deep layers).
  • The Mistake: If you replace the plating station with a fast-but-dumb robot, the dish looks ugly. If you replace the chopping station, the robot might mess up the knife skills.
  • The LINVIDEO Fix: Instead of guessing which stations to upgrade, they gave the kitchen a "smart switch" for every station. This switch learns, through trial and error, which stations can safely be swapped for the Fast Assistant and which ones must stay as the Master Chef.
  • The Result: They automatically found the perfect mix: "Keep the Master Chef on the complex plating, but let the Fast Assistant handle the chopping." This minimizes the drop in quality.

Trick B: The "Anytime Distribution Matching" (The Real-Time Taste Test)

Usually, when you try to speed up a model, you only check if the final video looks good. But in video generation, if the middle of the video is weird, the end will be weird too.

  • The Analogy: Imagine a student taking a test. If you only grade them on the final answer, they might have guessed their way there. But if you check their work at every step of the problem, you can correct them immediately.
  • The LINVIDEO Fix: They created a new "taste test" called Anytime Distribution Matching (ADM). Instead of waiting until the video is finished to see if it's good, they check the "flavor" of the video at every single second of the cooking process. They force the Fast Assistant to match the Master Chef's style at every moment, not just at the end.
  • The Result: This prevents the video from getting "jittery" or flickering, ensuring the whole movie feels smooth and natural.

3. The Results: Fast, Cheap, and Tasty

After this renovation, the results were impressive:

  • Speed: The video generation became 1.4 to 1.7 times faster just by swapping the attention layers.
  • Super Speed: When they combined this with a technique to skip steps (distillation), they created a "4-step" model that is 16 to 21 times faster than the original!
  • Quality: The videos still looked amazing. The "Master Chef" quality was preserved, even though they were using the "Fast Assistant" for most of the work.

Summary

LINVIDEO is like taking a slow, expensive luxury car and installing a high-performance engine in just the right parts of the chassis. You don't need to rebuild the whole car from scratch (pre-training), and you don't need a new driver (new data). You just tweak the existing machine so it drives twice as fast while still getting you to the destination in style.

This is a huge step forward because it means we can generate high-quality AI videos on regular computers much faster, making creative tools accessible to everyone, not just big tech companies with massive servers.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →