RED: Robust Event-Guided Motion Deblurring with Modality-Specific Disentanglement

This paper introduces RED, a robust event-guided motion deblurring network that employs a robustness-oriented perturbation strategy and a modality-specific disentanglement mechanism to effectively reconstruct sharp images from fragmented event data caused by real-world sensor under-reporting.

Yihong Leng, Siming Zheng, Jinwei Chen, Bo Li, Jiaojiao Li, Peng-Tao Jiang

Published Mon, 09 Ma
📖 5 min read🧠 Deep dive

Imagine you are trying to take a clear photo of a fast-moving race car. Because the car is moving so fast, your camera shutter stays open a tiny bit too long, and the resulting photo is a blurry mess. This is the problem of motion blur.

For a long time, computers have tried to fix these blurry photos using only the picture itself. But sometimes, the blur is so bad that the computer just guesses wrong, like trying to solve a puzzle with half the pieces missing.

Recently, scientists started using a special kind of camera called an Event Camera. Think of a normal camera as a video recorder that takes a picture every 1/30th of a second. An Event Camera is different: it's like a swarm of tiny, hyper-alert fireflies. Each firefly only "flashes" (sends an event) when it sees something change quickly, like a wheel spinning or a bird flapping its wings. These flashes happen incredibly fast, giving the computer a perfect map of where things are moving.

The Problem: The "Shy" Fireflies
The paper's authors noticed a big problem with these Event Cameras in the real world. To stop the camera from getting confused by noise (like dust or flickering lights), engineers set a "volume knob" (called a threshold) that tells the fireflies: "Only flash if the change is loud enough."

The trouble is, this makes the fireflies shy.

  • If a car is moving slowly, or if the edge of an object is faint, the change isn't "loud" enough.
  • The fireflies stay silent.
  • The computer gets a map of motion that is fragmented and missing pieces.

When existing computer programs tried to use these "shy" maps to fix the blurry photo, they got confused. They tried to mix the blurry photo and the broken motion map together, which made the final result even worse. It's like trying to bake a cake using a recipe that's missing half the ingredients and then mixing in some dirt because you thought it was chocolate.

The Solution: RED (Robust Event-guided Deblurring)
The authors created a new system called RED. They didn't just build a better cake mixer; they changed how they think about the ingredients. Here is how RED works, using simple analogies:

1. The "Training Camp" (RPS)

Before the system goes to work, they put it through a tough training camp. They simulate all kinds of "shy firefly" scenarios. They pretend the volume knob is turned up high, then low, then erratic.

  • Why? This teaches the computer: "Hey, sometimes the motion map will be broken. Don't panic. Learn to work with what you have."
  • Result: The system becomes tough and adaptable, ready for any real-world condition.

2. The "Specialized Teams" (Disentanglement)

Old systems tried to mix the blurry photo and the broken motion map into one big soup. RED says, "No! Let's keep the teams separate first."

  • The Image Team: Focuses only on the look of the photo (colors, shapes, textures). They ignore the movement.
  • The Event Team: Focuses only on the movement (where things changed). They ignore the colors.
  • The Cross-Team: A mediator that helps them talk to each other.

By separating them, the system prevents the "broken" motion data from ruining the "good" picture data. It's like having a translator who speaks two languages perfectly, rather than forcing everyone to speak a broken mix of both.

3. The "Handshake" (Selective Fusion)

Once the teams have done their own work, they come together, but very carefully.

  • MSEM (Motion Saliency Enhancer): The Event Team whispers to the Image Team: "Hey, look right here! There was a fast movement here, even though the picture is blurry. Let's sharpen this specific spot."
  • ESEM (Event Semantic Engraver): The Image Team whispers back to the Event Team: "You're missing some context because your map is broken. Here is the shape of the object so you know what you are looking at."

They only share information where it is useful. If the motion data is too broken, they ignore it and rely on the picture. If the picture is too blurry, they lean on the motion data.

The Result

When they tested RED, it was a game-changer.

  • Old methods: When the motion data was missing 30% of the time, the photo quality crashed.
  • RED: Even when the motion data was missing 50% of the time, RED still produced a sharp, clear photo. In fact, it was often better than systems that didn't use motion data at all.

In a Nutshell:
Imagine you are trying to fix a torn map.

  • Old way: You glue the torn pieces together randomly, making a mess.
  • RED way: You first study the map's geography (the image) and the torn pieces' shapes (the events) separately. Then, you carefully match the pieces only where they fit perfectly, ignoring the parts that are too damaged.

RED teaches computers to be smart about what they trust, ensuring that even when the sensors are imperfect, the final picture is crystal clear.