Imagine you are watching a movie. Usually, you expect the actors to move at a "normal" speed. If a person walks, they walk at a walking pace. If a bird flaps its wings, they flap at a bird's pace.
But what if the movie director forgot to tell the actors how fast to move?
- The bird might flap its wings so slowly it looks like it's floating in honey.
- The person might fall off a chair, but instead of dropping in a split second, they drift down like a feather, taking ten seconds to hit the floor.
This is exactly what is happening with the most advanced AI video generators today. They are amazing at making things look real (the textures, the lighting, the faces), but they have lost their sense of time.
Here is a simple breakdown of the paper "The Pulse of Motion," using some everyday analogies.
1. The Problem: "Chronometric Hallucination"
The authors call this phenomenon "Chronometric Hallucination."
Think of an AI video generator like a talented painter who has never seen a clock. The painter is given a pile of photos from the internet. Some photos are from a high-speed camera (capturing a bullet in slow motion), some are from a time-lapse (flowers blooming in seconds), and some are normal videos.
The AI doesn't know the difference. It just sees "movement." When it tries to paint a new video, it mixes these speeds up randomly.
- It might make a hummingbird move as slowly as a sloth.
- It might make a car crash happen in slow motion, defying gravity.
The video looks "smooth" and pretty, but the physics are broken. The AI has no internal "heartbeat" to tell it how fast time should pass.
2. The Solution: The "Visual Chronometer"
To fix this, the researchers built a tool called the Visual Chronometer.
Imagine you are trying to guess how fast a car is driving just by looking at a video, without knowing the speedometer or the road markings. You look at how much the background blurs, how fast the wheels spin, and how the light hits the car.
The Visual Chronometer is like a super-smart detective that does exactly this. It looks at the video and asks: "Based on how this object is moving, how many frames per second (FPS) should this actually be?"
It ignores the file name or the metadata (which might say "30 FPS" but is lying). Instead, it measures the Physical FPS (PhyFPS)—the true speed of the motion in the real world.
3. How They Trained the Detective
You can't teach a detective to spot speed if you only show them perfect, fake videos. The researchers had to be clever.
They took high-quality, high-speed videos (like slow-motion footage of a hummingbird) and artificially "slowed them down" or "sped them up" using three specific tricks to mimic real cameras:
- The Fast Shutter: Taking sharp, crisp snapshots (like a sports camera).
- The Motion Blur: Blurring the image slightly to show speed (like a camera with a slow shutter).
- The Rolling Shutter: Distorting the image slightly because the camera sensor reads the image line-by-line (like a cheap phone camera).
By training the AI on these "tricky" videos, the Visual Chronometer learned to ignore the tricks and focus on the actual physics of the movement.
4. The Big Reveal: The Audit
The researchers used their new detective to audit the world's best AI video generators (like Sora, Kling, and others).
The results were harsh:
- The Lie: Most AI models claim to run at a standard speed (e.g., 24 or 30 frames per second).
- The Truth: The Visual Chronometer found that the actual motion inside the videos was often wildly different. A video labeled "30 FPS" might actually be moving at "15 FPS" (too slow) or "60 FPS" (too fast).
- The Instability: Even within a single video, the speed would jump around. One second the car is zooming, the next second it's drifting.
It turns out that even the "smartest" AI models are terrible at keeping a steady beat. They are like a drummer who keeps changing the tempo every few bars.
5. The Fix: Making it Feel Real
The most exciting part is what happens when they fix the speed.
The researchers took an AI-generated video that looked "weirdly slow" or "floaty." They used the Visual Chronometer to figure out the true speed, and then they simply re-timed the video to match that speed.
The result?
- People watching the corrected videos said they felt much more natural.
- The "floaty" gravity disappeared.
- The bird flapped its wings at a realistic speed.
Interestingly, they found that making the entire video run at one consistent, corrected speed felt better to humans than trying to change the speed constantly within the video. It's like listening to a song: you want a steady beat, not a song that speeds up and slows down randomly.
6. Why This Matters
The paper concludes with a quote from Aristotle: "We measure movement by time, and time by movement."
Right now, AI video models are great at the "movement" part (the visuals) but terrible at the "time" part. If we want AI to act as a "World Model" (a simulator that can predict how the real world works), it must understand time.
If an AI can't tell the difference between a falling apple and a floating balloon, it can't be trusted to simulate physics, drive cars, or help scientists.
In short: The researchers built a "speedometer" for AI videos. They found that current AI is driving blind, and by using their tool to fix the speed, they made the videos feel real again. This is the first step toward teaching AI to truly understand the rhythm of our physical world.
Get papers like this in your inbox
Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.