Improving Motion in Image-to-Video Models via Adaptive Low-Pass Guidance

This paper introduces Adaptive Low-Pass Guidance (ALG), a training-free method that enhances motion dynamics in image-to-video generation by filtering high-frequency details from the conditioning image during early denoising stages, thereby preventing the model from overfitting to static appearances while preserving image quality and text alignment.

June Suk Choi, Kyungmin Lee, Sihyun Yu, Yisol Choi, Jinwoo Shin, Kimin Lee

Published 2026-02-25
📖 5 min read🧠 Deep dive

The Problem: The "Frozen Frame" Effect

Imagine you have a magical photo frame (an Image-to-Video AI). You put a picture of a cat in it, and you ask the frame to "make the cat run."

Ideally, the cat should sprint across the screen. But in reality, these AI models often get stuck. The cat barely twitches. It looks like a high-quality photo that is just slightly vibrating, rather than a lively video.

The researchers found out why this happens. It's like the AI is too obsessed with the details of the original photo.

  • The Analogy: Imagine a painter who is given a photo of a cat and told to paint a video of it running. If the painter looks too closely at the photo's tiny details (the exact whisker shape, the specific texture of the fur) right at the very first second of painting, they get "locked in." They spend so much time trying to match those tiny details perfectly that they forget to paint the movement. The result is a beautiful, static painting that never moves.

In technical terms, the AI gets "over-conditioned" by the high-frequency details (sharp edges, textures) of the input image. It takes a "shortcut" to copy the look of the image immediately, sacrificing the motion.

The Simple Fix (But with a Catch)

The researchers first tried a simple trick: Blur the photo before showing it to the AI.

  • The Analogy: If you give the painter a blurry, low-resolution photo of the cat, they can't get stuck on the tiny whiskers. They have to focus on the big picture: "Okay, the cat is here, and it needs to run." Because the details are fuzzy, the painter is forced to create a dynamic, flowing motion.
  • The Catch: While the motion is great, the final video looks blurry and low-quality because the AI started with a blurry image. You get a dynamic video, but it doesn't look like the original cat anymore.

The Solution: "Adaptive Low-Pass Guidance" (ALG)

The team came up with a clever, two-step strategy called ALG. Think of it as a Director who knows when to be strict and when to be loose.

Here is how ALG works, step-by-step:

  1. The "Blurred Start" (Early Steps):
    At the very beginning of the video generation (the first few seconds of "painting"), the AI is shown a blurred version of the input image.

    • Why? This stops the AI from getting obsessed with tiny details. It forces the AI to focus on the big motion: "The cat is running!" It builds a dynamic, fluid skeleton for the video.
  2. The "Sharp Finish" (Later Steps):
    Once the motion is established and the video is flowing, the AI is suddenly shown the original, sharp, high-quality photo.

    • Why? Now that the "running" action is already happening, the AI can safely add back all the sharp details (the fur, the whiskers, the eyes) without getting stuck. It refines the blurry motion into a crisp, high-definition video.

The Metaphor:
Imagine building a house.

  • Old Way (Standard AI): You try to lay every single brick perfectly while the foundation is still wet. The house ends up standing still, but the walls are crooked because you were too focused on the bricks.
  • The "Blur" Way: You build the whole house out of mud. It moves and flows great, but it's a muddy mess.
  • The ALG Way: You first build a rough, fast-moving mud structure to get the shape and flow right (the motion). Then, once the shape is solid, you swap the mud for perfect, sharp bricks (the details). You get a house that is both dynamic and beautiful.

The Results

The researchers tested this on several popular AI video models (like Wan 2.1, Wan 2.2, and LTX-Video).

  • Motion Boost: The videos became 33% more dynamic. Animals ran faster, cars drove more naturally, and scenes felt alive.
  • Quality Preserved: Unlike the "blurry start" method, the final videos were just as sharp and high-quality as the originals.
  • No Training Needed: The best part? They didn't have to re-teach the AI how to learn. They just changed the "rules" of how the AI looks at the photo during the creation process. It's a free upgrade!

Summary

The paper solves the problem of AI videos being too static by teaching the AI to ignore the tiny details at the start (to encourage movement) and add them back at the end (to ensure quality). It's like telling a dancer: "First, just get the rhythm and the big moves right. Don't worry about your shoes yet. Once you're moving, we'll polish the shoes."

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →