OTPL-VIO: Robust Visual-Inertial Odometry with Optimal Transport Line Association and Adaptive Uncertainty

This paper presents OTPL-VIO, a robust stereo visual-inertial odometry system that enhances performance in low-texture and illumination-challenging environments by employing a training-free deep descriptor with entropy-regularized optimal transport for line association and introducing adaptive uncertainty weighting to stabilize estimation.

Zikun Chen, Wentao Zhao, Yihe Niu, Tianchen Deng, Jingchuan Wang

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are trying to navigate a dark, empty warehouse while wearing a blindfold, but you have a friend (the camera) and a sense of balance (the IMU) to help you. This is essentially what a robot does when it tries to figure out where it is using Visual-Inertial Odometry (VIO).

Most robots rely on "point features"—like distinct corners of a box or a unique pattern on a wall—to know where they are. But what happens if you walk into a long, white hallway with no decorations (low texture) or if the lights suddenly flicker on and off (abrupt illumination changes)? The robot's "eyes" go blind because there are no unique corners to grab onto. It starts to guess, and those guesses get worse and worse, causing the robot to get lost.

This paper introduces OTPL-VIO, a new navigation system designed specifically to solve this "blind in a white room" problem. Here is how it works, explained through simple analogies:

1. The Problem: Relying Only on "Dots"

Traditional systems are like a person trying to navigate a room by only looking at dots on the floor. If the floor is clean and has no dots, or if the lights go out, the person is stuck.

  • The Flaw: Even if you try to use lines (like the edges of a door frame) to help, most current systems try to find lines by first finding dots along those lines. If the dots disappear because of bad lighting, the line connection breaks, and the system fails.

2. The Solution: Giving Lines Their Own "ID Card"

The authors realized that lines (like the edge of a table or a wall corner) are everywhere, even in boring, white rooms. But to use them, the robot needs to recognize them without relying on dots.

  • The "Deep Descriptor" (The ID Card):
    Imagine every line segment gets its own unique ID card or fingerprint. Instead of looking for dots on the line to identify it, the system scans the entire line and creates a summary of its "vibe" (texture, shape, context).
    • The Magic: This ID card is created automatically by a smart AI that doesn't need extra training. It's like a security guard who can instantly recognize a person's face even if they are wearing a hat or standing in the dark, just by looking at their overall silhouette.
    • Adaptability: If a line is near lots of dots, the ID card focuses on the dots. If the dots are gone, the ID card focuses entirely on the line's shape. It's a chameleon that changes its strategy based on what's available.

3. The Matching: The "Global Seating Plan" vs. "Guessing Neighbors"

Once the robot sees lines in the current frame and the previous frame, it needs to match them up.

  • Old Way (Local Matching): Imagine trying to find your friend in a crowd by only looking at the person standing immediately next to you. If your friend moves or the crowd shifts, you might grab the wrong person. This is what older systems do; they get confused easily.
  • New Way (Optimal Transport): The new system acts like a smart event planner. Instead of just looking at neighbors, it looks at the entire room. It asks, "Given all the lines I saw before and all the lines I see now, what is the single best way to pair them up so everyone is happy?"
    • It uses a mathematical concept called Optimal Transport (think of it as the most efficient way to move furniture from one room to another).
    • It can handle "ghosts" (lines that disappeared) and "newcomers" (lines that appeared) without panicking. It ensures that even if the view is blurry or partial, the connections remain consistent.

4. The Safety Net: "Trust but Verify"

Not all lines are created equal. A short, fuzzy line is less reliable than a long, sharp one.

  • Adaptive Weighting: Imagine you are driving a car. If you see a road sign that is clear and far away, you trust it 100%. If you see a sign that is blurry and close, you trust it less.
  • The system does the same thing. It calculates how "noisy" or unreliable a line is. If a line is shaky or short, the system turns down the volume on that clue so it doesn't mess up the robot's position. If the line is strong, it turns up the volume. This prevents bad data from dragging the robot off course.

The Result: A Robot That Never Gets Lost (Even in the Dark)

The authors tested this system in:

  1. Standard labs: Where it beat all other robots in accuracy.
  2. Harsh environments: Where lights flickered wildly and walls were blank.
  3. Real-world scenarios: Like walking through a dimly lit warehouse with sudden bright lights.

The Verdict:
While other robots stumbled, tripped, or got lost in these confusing environments, OTPL-VIO kept walking straight. It did this by:

  1. Giving lines their own unique "fingerprint" so they don't need dots to be recognized.
  2. Using a "global seating plan" to match lines perfectly, even when the view is messy.
  3. Ignoring bad clues and listening only to the reliable ones.

It's like upgrading from a robot that needs a map with every single streetlight to a robot that can navigate by the shape of the buildings themselves, even if the streetlights go out.