Motion-aware Event Suppression for Event Cameras

This paper introduces a lightweight, real-time framework for motion-aware event suppression that jointly segments and predicts the future motion of independent moving objects and ego-motion to filter dynamic events, achieving state-of-the-art performance on the EVIMO benchmark while significantly accelerating downstream applications like Vision Transformers and visual odometry.

Roberto Pellerito, Nico Messikommer, Giovanni Cioffi, Marco Cannici, Davide Scaramuzza

Published 2026-03-02
📖 4 min read☕ Coffee break read

Imagine you are trying to listen to a friend whisper a secret in the middle of a roaring, chaotic rock concert. The music (the background noise) is so loud and constant that it drowns out your friend's voice. This is exactly the problem event cameras face.

The Problem: The "Noise" of Motion

Traditional cameras take pictures like a flipbook, capturing everything in a frame every 1/30th of a second. Event cameras are different; they are like super-sensitive ears. They only "hear" (or record) when something changes at a specific pixel. If a leaf moves, it makes a sound. If a car drives by, it makes a sound.

But here's the catch: Everything moves.
When you walk down the street, your own movement (ego-motion) makes the trees, buildings, and sidewalks "scream" with data because they are shifting in your view. Meanwhile, a pedestrian crossing the street (an Independent Moving Object, or IMO) also makes noise.

The camera is flooded with millions of these "sounds." It can't tell the difference between the noise of the background moving because you moved, and the noise of a dangerous object moving on its own. This overload slows down robots and autonomous cars, making them sluggish and confused.

The Solution: The "Future-Seeing" Filter

The authors of this paper built a smart filter called Motion-aware Event Suppression. Think of it as a bouncer at a very exclusive club who doesn't just look at who is standing at the door right now, but predicts who will be there in the next split second.

Here is how their system works, using simple analogies:

1. The "Crystal Ball" Prediction

Most systems try to sort the noise after it happens. By the time they realize, "Oh, that tree was just moving because I walked," the data is already clogging the system.

This new system is different. It looks at the current scene and predicts the future (about 100 milliseconds ahead).

  • The Analogy: Imagine you are playing catch. A normal player catches the ball when it arrives. This system is like a player who sees the ball being thrown and runs to the spot where the ball will be before it even gets there.
  • How it works: The AI looks at the current movement and calculates a "flow map" (like a wind map for pixels). It asks, "If that car keeps moving at this speed, where will it be in 0.1 seconds?"

2. The "Time-Traveling" Mask

Once the system predicts where the moving objects (like cars or people) will be, it creates a digital "mask" or stencil.

  • The Analogy: Imagine you have a stencil of a moving car. Instead of waiting for the car to pass through your window, you hold the stencil up before the car gets there.
  • The Magic: Because the system knows exactly where the car will be, it can "suppress" (silence) all the background noise that won't be there, and only keep the data for the car. It effectively deletes the "roar" of the concert so you can hear the whisper.

3. The "Smart Bouncer" for Robots

This isn't just about cleaning up data; it's about making robots faster and smarter.

  • Visual Odometry (GPS for Robots): Robots need to know where they are. If they get confused by background noise, they think they are moving when they are standing still. By filtering out the background "noise," the robot's GPS becomes much more accurate (like cleaning up a foggy windshield).
  • Token Pruning (Speeding Up AI): Modern AI (like Vision Transformers) looks at an image by breaking it into thousands of tiny puzzle pieces (tokens). Usually, it tries to solve all of them. This system says, "Hey, the sky and the road aren't moving. Let's ignore those puzzle pieces and only solve the ones with the moving car." This makes the AI run 83% faster.

Why This is a Big Deal

  • Speed: It runs at 173 times per second on a standard computer chip. That's faster than the blink of an eye.
  • Accuracy: It is 67% better at finding moving objects than the previous best methods.
  • Efficiency: It uses very little memory, meaning it can run on small, battery-powered robots (like drones or self-driving cars) without needing a supercomputer.

The Bottom Line

This paper introduces a way for robots to anticipate the future to filter out the noise of the present. Instead of drowning in a sea of data, the robot learns to ignore the background chaos and focus only on the things that matter, making it faster, safer, and more efficient. It's the difference between trying to hear a conversation in a hurricane versus having a noise-canceling headset that predicts exactly when the wind will blow.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →