Real-Time Motion Detection Using Dynamic Mode Decomposition

This paper proposes a real-time motion detection algorithm for streaming video that utilizes Dynamic Mode Decomposition to identify foreground movement by analyzing the correspondence between video feature evolution and the resulting matrix eigenvalues, demonstrating its effectiveness on security footage through ROC analysis and cross-validation.

Marco Mignacca, Simone Brugiapaglia, Jason J. Bramburger

Published 2026-02-26
📖 5 min read🧠 Deep dive

Imagine you are sitting in a security guard's chair, watching a live feed of a busy street. Your job is to spot anyone who walks into the frame. But here's the catch: the wind is blowing the trees, the clouds are moving across the sky, and the shadows are shifting. If you just look for any change in the picture, you'll get a headache from all the "false alarms" caused by the wind.

This paper introduces a clever new way to solve that problem using a mathematical tool called Dynamic Mode Decomposition (DMD). Think of DMD not as a complex math formula, but as a super-smart musical conductor for video.

Here is how it works, broken down into simple steps:

1. The Conductor and the Orchestra

Imagine the video is a symphony.

  • The Background (The Drums): The trees swaying, the clouds drifting, and the static street signs are like the steady, rhythmic drumbeat of the song. They are always there, moving in a predictable, slow pattern.
  • The Foreground (The Soloist): A person walking into the frame is like a sudden, loud trumpet solo. It breaks the rhythm.

Traditional motion detectors are like a person who screams "Music!" every time any instrument makes a sound. They can't tell the difference between the steady drumbeat (wind) and the trumpet solo (a person).

DMD is the conductor. It listens to the whole video and instantly separates the "drumbeat" (the background) from the "soloist" (the moving person). It does this by looking at the video in tiny, overlapping slices (like looking at a movie one frame at a time).

2. The "Magic Numbers" (Eigenvalues)

How does the conductor know what is background and what is motion? It uses "magic numbers" (mathematicians call them eigenvalues).

  • The Background Numbers: These numbers are very calm and stable. They are close to zero or one. They represent the boring, unchanging parts of the video.
  • The Motion Numbers: When a person walks in, these numbers go crazy. They spike up suddenly.

The paper's method is essentially a motion alarm system that watches these numbers. As long as the numbers stay calm, the system says, "All quiet, just the wind." But the moment the numbers spike (like a heart rate monitor going off), the system shouts, "Someone is moving!"

3. The "Sliding Window" Trick

The video is too big to analyze all at once. So, the method uses a sliding window.
Imagine you are reading a book, but you only look at three sentences at a time through a small card with a hole in it. You read three sentences, then slide the card down to the next three, and so on.

  • The computer does this with video frames. It looks at a short chunk of time (about 3 seconds), analyzes the "music" of that chunk, and then slides forward to the next chunk.
  • This allows it to work in real-time. It doesn't need to wait for the whole movie to finish; it detects the intruder the second they step into the frame.

4. Compressing the Data (The "Summary" Trick)

High-definition video has millions of pixels. Analyzing all of them is like trying to read every single word in a library to find one typo. It's too slow.
The authors use a trick called Compressed DMD. Imagine you have a 1,000-page novel, but you only need to know the plot. You ask a friend to summarize it into a 5-page outline.

  • The computer creates a tiny "summary" of the video (reducing millions of pixels to just a few dozen numbers).
  • It analyzes this summary. If the summary changes drastically, it knows a person is there.
  • This makes the process incredibly fast and cheap, allowing it to run on standard computers without needing supercomputers.

5. Tuning the Sensitivity (The "Volume Knob")

Every security camera is different. A windy park needs a different setting than a quiet office hallway.

  • If the "sensitivity" (the threshold) is too high, the system ignores slow walkers.
  • If it's too low, it screams "Intruder!" every time a leaf blows by.

The paper suggests a smart way to tune this "volume knob." They use a method called Cross-Validation, which is like a practice test. They run the system on a few test videos, adjust the knob, and see if it catches the people without crying wolf at the wind. They find the "Goldilocks" setting that works best for that specific camera.

The Bottom Line

This paper presents a motion detector that is:

  1. Fast: It works in real-time because it uses "summaries" of the video.
  2. Smart: It ignores the wind and shadows (the background) and only cares about the "soloists" (people).
  3. Explainable: Unlike some "black box" AI that you can't understand, this method is based on clear math. If it detects motion, you can point to the specific "spike" in the numbers that caused the alarm.

In short, it's a way to teach a computer to watch a video, ignore the boring stuff, and only pay attention when something interesting happens, all while doing the math fast enough to do it live.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →