Evaluating the Effect of Compression on Video Temporal Consistency Using Objective Quality Metrics

This paper systematically evaluates how video compression impacts temporal consistency across multiple codecs and content types, revealing that temporal degradation follows a non-linear pattern and is disproportionately severe in sequences with unpredictable dynamics, thereby challenging the assumption that motion volume alone dictates encoding difficulty.

Original authors: Peter Zsoldos

Published 2026-05-19✓ Author reviewed
📖 4 min read☕ Coffee break read

Original authors: Peter Zsoldos

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to send a flipbook animation to a friend over a slow internet connection. To make the file smaller, you have to "compress" it—basically, you tell the computer to be smart about what details to keep and what to throw away. Usually, the computer assumes that if an object moves, the next picture will look very similar to the last one, so it only sends the changes. This is how video compression works.

This paper is like a detective story investigating what happens when that "smart assumption" breaks down.

The Main Mystery: The "Predictability Trap"

The researchers tested four different video compression tools (think of them as different brands of video editors: H.264, HEVC, VP9, and AV1) on many different types of videos. They wanted to see how well these tools kept the video looking smooth and consistent from one frame to the next.

They discovered a strange phenomenon they call the "Predictability Anomaly."

Here is the analogy:

  • Scenario A (The Train): Imagine a video of a train moving smoothly down a track. Even if the train is moving very fast, the computer can easily guess what the next frame will look like because the motion is predictable.
  • Scenario B (The Crowd): Now imagine a video of a chaotic crowd or splashing water. The movement is wild and irregular. Even if the total amount of movement is less than the train, the computer cannot guess what happens next.

The Surprise: The researchers found that the computer handles the fast, predictable train (Scenario A) much better than the chaotic crowd (Scenario B). In fact, the chaotic crowd causes the video to glitch, flicker, and look unstable much faster than the fast train does.

The "VMAF Paradox": The Camera That Lies

The paper highlights a major problem with how we currently measure video quality. There is a popular tool called VMAF that acts like a judge, giving videos a score based on how sharp and clear they look.

The researchers found a "Paradox":
When the computer struggles with the chaotic crowd (Scenario B), it gives up on trying to predict the motion. Instead, it stops guessing and just takes a perfect, high-quality photo of every single moment (these are called "I-frames").

  • The Result: Because every single frame is a sharp, perfect photo, the VMAF judge gives the video an unrealistically high score. It thinks the video looks great.
  • The Reality: If you watch the video, it looks terrible. The images are sharp, but they "jump" or "flicker" because the connection between the frames is broken. It's like looking at a flipbook where every drawing is a masterpiece, but the animation is jerky and broken.

The paper calls this the "VMAF Paradox": The video looks perfect on paper (high score) but feels broken to the human eye (low stability).

The "Smoking Gun"

The researchers proved this by looking at how much the video improved when they gave the computer more data (higher bitrate).

  • For the predictable train, doubling the data made the video much smoother and more stable.
  • For the chaotic crowd, even giving the computer four times as much data didn't fix the flickering. The computer just kept taking perfect, isolated photos instead of learning how to connect them.

The Takeaway

The paper concludes that predictability matters more than speed.

  • Old Assumption: "Fast motion is hard to compress."
  • New Discovery: "Unpredictable, chaotic motion is the real nightmare for compression."

The current tools are "cheating" by focusing on making individual frames look sharp, which tricks our quality meters, but they are failing to keep the motion smooth. The paper suggests that future video technology needs to stop just looking at single frames and start paying attention to how the video flows from one moment to the next, especially for chaotic scenes like crowds or water.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →