Training-free Latent Inter-Frame Pruning with Attention Recovery

This paper introduces LIPAR, a training-free framework that accelerates video generation by pruning redundant latent patches and recovering attention values to maintain quality, thereby achieving a 1.45x throughput increase without compromising visual fidelity.

Dennis Menn, Yuedong Yang, Bokun Wang, Xiwen Wei, Mustafa Munir, Feng Liang, Radu Marculescu, Chenfeng Xu, Diana Marculescu

Published 2026-03-09
📖 4 min read☕ Coffee break read

Imagine you are a director filming a scene with a very expensive, slow-moving camera. You are filming a cartoon Santa Claus walking through a room.

The Problem:
The camera is so powerful that it takes a photo of every single pixel in the room, even the parts that aren't moving.

  • Frame 1: Santa walks in. The camera snaps a picture of Santa, the floor, the walls, and the ceiling.
  • Frame 2: Santa takes one step. The camera snaps another picture. But wait! The floor, the walls, and the ceiling look exactly the same as in Frame 1. Only Santa moved a tiny bit.
  • Frame 3: Santa takes another step. The camera snaps a third picture. Again, the background is identical.

The current AI video generators are like this camera. They waste a massive amount of time and battery power re-calculating the "floor" and "walls" for every single frame, even though they haven't changed. This makes generating videos slow and expensive.

The Solution: LIPAR (The Smart Editor)
The authors of this paper created a new method called LIPAR (Latent Inter-Frame Pruning with Attention Recovery). Think of it as a super-smart editor who watches the footage and says, "Hey, we don't need to film the wall again!"

Here is how it works, broken down into three simple steps:

1. The "Lazy" Pruning (Skipping the Boring Parts)

Instead of re-filming the whole scene, LIPAR looks at the previous frame. If a patch of the image (like the background wall) hasn't changed, it skips calculating it for the new frame.

  • Analogy: Imagine you are writing a story. In Chapter 1, you describe the room in great detail. In Chapter 2, the room is the same. Instead of rewriting the description of the room, you just say, "The room remained the same," and only write about the new action (Santa moving).
  • Result: The computer does way less work, making the video generation much faster.

2. The "Glitch" Problem (Why Skipping is Dangerous)

If you just skip the calculation and copy-paste the old background, you might run into a problem.

  • The Analogy: Imagine a musician playing a song. If you just copy-paste a note from the previous measure without changing the volume or the "vibe," the music starts to sound robotic and flat. In AI video, if you just copy the old data, the AI gets confused because it was trained to expect new data every time. This causes weird visual glitches, like the background shimmering or looking "noisy."

3. The "Attention Recovery" (The Magic Fix)

This is the paper's secret sauce. LIPAR doesn't just skip the work; it uses a clever trick to fake the calculation so the AI doesn't notice the difference.

  • The Analogy: Think of the AI as a chef making a soup. The chef is used to stirring the pot with a specific rhythm. If you suddenly stop stirring (pruning), the soup burns.
    • LIPAR's trick: The chef stops stirring the whole pot but uses a special spoon to gently mimic the stirring motion just enough to keep the soup perfect.
    • The "Noise" Secret: The paper discovered that AI video generation adds a little bit of random "static" (noise) to every frame, like grain in a photo. If you just copy the old frame, you accidentally copy the same static, which makes the video look weird. LIPAR's "Attention Recovery" ensures that even though it's reusing old data, it adds the right kind of new static so the AI thinks it's seeing a fresh, natural frame.

The Results: Speed vs. Quality

Before LIPAR, generating these videos was like driving a car at 8 miles per hour.

  • With LIPAR: The car speeds up to 12.2 miles per hour (a 45% increase in speed).
  • Memory: It uses 29% less computer memory (like needing a smaller gas tank).
  • Quality: The video looks just as good as the slow version. Humans couldn't tell the difference in blind tests!

Summary

LIPAR is like a smart video editor that knows when to stop working. It realizes that if the background isn't moving, it doesn't need to re-calculate it. But unlike a lazy editor who would just copy-paste and ruin the quality, LIPAR uses a mathematical "magic trick" to make the AI think it did the work, keeping the video smooth, fast, and high-quality.

It bridges the gap between old-school video compression (which skips static pixels) and modern AI video generation (which usually calculates everything), making real-time AI video a reality.