Receding-Horizon Maximum-Likelihood Estimation of Neural-ODE Dynamics and Thresholds from Event Cameras

This paper proposes a receding-horizon maximum-likelihood estimator that jointly identifies Neural ODE dynamics and unknown contrast thresholds from asynchronous event camera streams by modeling events as a history-dependent marked point process and optimizing a log-likelihood objective via gradient steps on a sliding window.

Kazumune Hashimoto, Kazunobu Serizawa, Masako Kishida

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are trying to figure out how a car is driving, but you can't see the car itself. Instead, you only have a very strange, high-tech security camera that doesn't take photos.

The Camera: The "Event" Eye
This special camera (called an Event Camera) works differently than your phone camera. Your phone takes a picture every fraction of a second, even if nothing is moving, creating a lot of blurry, repetitive data.

This event camera is like a hyper-alert guard. It only "blinks" (sends a signal) when it sees something change.

  • If a car moves across the screen, the camera sends a tiny message: "Hey, the light got brighter at this spot!" or "It got darker there!"
  • It sends these messages at the exact microsecond they happen.
  • The Catch: The camera has a "sensitivity setting" (a threshold). It only blinks if the change in light is strong enough to cross that line. If the setting is too high, it misses small movements. If it's too low, it gets noisy.

The Problem: The Mystery Box
The researchers wanted to use these blinking signals to figure out two things:

  1. The Physics: How is the object actually moving? (Is it spinning? Slowing down? Following a curve?)
  2. The Camera's Secret: What is the exact sensitivity setting of the camera?

The problem is that the camera's sensitivity setting is often unknown and can change. If you guess the wrong setting, your math for how the object is moving will be wrong. It's like trying to solve a puzzle where you don't know the shape of the pieces and you don't know the picture you're trying to build.

The Solution: The "Sliding Window" Detective
The authors created a smart system to solve this puzzle in real-time. Here is how they did it, using some fun analogies:

1. The Neural ODE: The "Imagination Engine"

They built a digital brain (a Neural ODE) that acts like a movie director's imagination. It constantly guesses: "If the object is moving like this, what should the light look like right now?"

  • It doesn't just guess the position; it guesses the entire history of how the light changed to get to this point.

2. The Smooth Surrogate: The "Soft Threshold"

In the real world, the camera is a hard switch: "Did the light change enough? Yes/No." This is bad for math because you can't easily calculate the "slope" of a switch.

  • The researchers invented a smooth, fuzzy version of this switch. Imagine the threshold isn't a hard wall, but a hill. The closer the light change gets to the top of the hill, the more likely the camera is to "blink."
  • This allows the computer to use calculus (gradients) to gently nudge its guesses until they fit the data perfectly.

3. The Receding Horizon: The "Sliding Window"

This is the most clever part.

  • The Old Way: Imagine trying to solve a mystery by reading the entire history of a crime scene from the beginning of time up to the present moment every single time you get a new clue. It would take forever and crash your computer.
  • The New Way (Receding Horizon): The researchers only look at the last few seconds of the video.
    • They take a "window" of time (say, the last 15 seconds).
    • They use the data in that window to update their guesses about the movement and the camera settings.
    • Then, they slide the window forward, drop the oldest data, and add the newest data.
    • Why? It keeps the math fast and manageable, like a detective focusing only on the most recent clues rather than re-reading the whole case file every time.

4. The Monte Carlo Subsampling: The "Spot Check"

To check if their guess is good, they have to compare their "Imagination Engine" against the actual camera blinks.

  • Normally, they would have to check every single pixel on the screen (thousands of them) to see if the math adds up. That's too slow.
  • Instead, they use Monte Carlo Subsampling. Imagine you are judging the quality of a giant pizza. Instead of tasting every single slice, you randomly pick 500 slices, taste them, and assume the whole pizza tastes like that.
  • The computer picks a random sample of pixels to check the math, saving massive amounts of time.

The Result

By combining these tricks, the system can:

  1. Learn the movement: It figures out the exact physics of the moving object (speed, direction, spin).
  2. Learn the camera: It figures out the camera's hidden sensitivity settings, even if they vary from pixel to pixel.
  3. Do it live: It updates these guesses instantly as the video plays, without getting bogged down by old data.

In a Nutshell:
The paper teaches a computer to watch a camera that only blinks when things change, and to figure out both how the object is moving and how sensitive the camera is, all by looking at a sliding window of recent history and taking quick "spot checks" of the data. It turns a chaotic stream of tiny blips into a clear, smooth understanding of the world.