FLIGHT: Fibonacci Lattice-based Inference for Geometric Heading in real-Time

Imagine you are watching a movie where the camera is gliding smoothly through a bustling marketplace. Even though people are walking around, cars are driving by, and the scene is chaotic, your brain instantly knows which way the camera is moving. You don't need to calculate complex math; you just feel the motion.

Computer scientists have been trying to teach computers to do this same "feeling" for decades. The paper you're asking about, FLIGHT, is a new, super-fast way for computers to figure out exactly which direction a camera is moving, even when the video is messy, noisy, or full of moving objects.

Here is the breakdown of how it works, using some everyday analogies.

The Problem: The "Noisy Crowd"

Imagine you are in a crowded room trying to figure out which way the wind is blowing.

The Good News: Most people in the room are standing still (the walls, the floor, the trees). If you ask them which way the wind is blowing, they will all point in the same direction.
The Bad News: Some people are running around (cars, pedestrians, birds). If you ask them, they will point in random directions because they are moving, not just the wind.
The Challenge: Traditional computer methods try to ask every single person in the room, one by one, to find the answer. If the room is huge and chaotic, this takes forever. Or, they try to guess by picking a few random people, but if they pick the runners, they get the wrong answer.

The Solution: FLIGHT (The "Voting Booth" on a Globe)

The authors propose a method called FLIGHT (Fibonacci Lattice-based Inference for Geometric Heading in real-Time). Instead of asking people one by one, they set up a giant voting booth on a sphere (like a globe).

Here is how the magic happens:

1. The "Great Circle" Clue

When the camera moves, every single point in the video (a pixel) gives a clue about the direction.

Analogy: Imagine you are on a boat. If you see a lighthouse moving to your left, you know you are moving to the right. But you don't know exactly how far right. You only know you are somewhere on a specific line.
In the paper: For every pair of matching points in the video, the math draws a "Great Circle" on the globe. This circle represents all possible directions the camera could be moving to make those two points look the way they do.

2. The Fibonacci Lattice (The Perfect Grid)

To count the votes, you need a grid on the globe.

The Old Way: Imagine drawing a grid on a globe like a standard map (latitude and longitude). The squares near the poles get squished and tiny, while the squares near the equator are huge. This is unfair and messy for counting votes.
The FLIGHT Way: They use a Fibonacci Lattice.
- Analogy: Think of the seeds inside a sunflower. They are arranged in a spiral pattern that is perfectly spaced out, with no clumps and no gaps. This is the Fibonacci pattern.
- By using this pattern, they create a grid of "voting bins" on the globe where every bin is the exact same size and perfectly spaced. No matter where you look on the globe, the grid is fair.

3. The Voting Process

Now, the computer takes every "Great Circle" clue from the video and casts votes into these bins.

If a clue (a Great Circle) passes through a bin, that bin gets a vote.
If the camera is moving North, the "North" bin will get hit by thousands of clues from the stationary objects (the walls, the trees).
The "East" or "West" bins will get very few votes because the moving objects (the runners) are too few to overpower the crowd of stationary objects.
The Winner: The bin with the most votes is the direction the camera is moving.

4. Speeding It Up (The "Hierarchical" Trick)

Counting votes for every single bin for every single clue would still be slow. So, FLIGHT uses a two-step strategy:

Step 1 (The Wide Net): First, it uses a very sparse grid (few bins) to quickly find the general neighborhood where the winner is. It's like looking at a map of the whole country to find the right state.
Step 2 (The Zoom In): Once it knows the winner is in "California," it zooms in and uses a super-dense grid just for that area to find the exact city.
Early Stopping: It also has a "stop button." As soon as the votes are clear enough (e.g., "We are 95% sure it's North"), it stops counting the rest of the clues. It doesn't waste time checking the last 10% of the data if the answer is already obvious.

Why is this a Big Deal?

It's Fast: Because of the Fibonacci grid and the "zoom-in" strategy, it runs in real-time. It's like having a GPS that calculates your route instantly, even in heavy traffic.
It's Tough: It doesn't get confused by moving objects (outliers). Even if 80% of the video is chaotic, the stationary 20% is enough to win the vote.
It Helps Robots: The paper shows that if you give this direction to a robot (like a drone or a self-driving car) right at the start, the robot doesn't get lost as easily. It improves the whole "SLAM" (Simultaneous Localization and Mapping) system, which is how robots build a map of the world while moving through it.

The Bottom Line

FLIGHT is like a super-smart, super-fast referee in a crowded stadium. Instead of listening to every single shout, it sets up a perfect grid, listens to the crowd's general direction, and instantly picks the winner, ignoring the noise. It allows computers to "feel" motion just like humans do, but with the speed and precision of a machine.

1. Problem Definition

The paper addresses the challenge of estimating camera translation direction (heading) from monocular video, assuming the camera rotation is either known (e.g., from an IMU) or has been estimated independently.

Core Challenge: Existing methods for recovering camera heading often struggle with noise and outliers (caused by moving objects, dynamic scenes, or feature matching errors).
- Methods relying on minimal noise assumptions fail in dynamic environments.
- Robust methods using sampling strategies (like RANSAC) often suffer from high computational complexity ( $O(n^3)$ or $O(n^2)$ ), making them unsuitable for real-time applications.
Goal: Develop a method that is both robust to high levels of outliers/noise and computationally efficient enough for real-time use (e.g., robotics, SLAM, drone navigation).

2. Methodology: The FLIGHT Algorithm

The authors propose FLIGHT, a voting-based technique that generalizes the Hough transform to the unit sphere ( $S^2$ ). The method operates under the assumption of pure translational motion (or rotation-compensated motion).

A. Geometric Foundation

Epipolar Constraint: For a pair of rotation-compensated correspondences $(\tilde{p}, q)$ , the translation vector $t$ must satisfy $q \cdot (t \times \tilde{p}) = 0$ .
Great Circles: This constraint implies that the valid translation directions lie on a great circle on the unit sphere $S^2$ . The normal to this great circle is defined by the cross product of the correspondences: $n = \tilde{p} \times q$ .
Consensus: In a noise-free scenario, the intersection of two great circles yields the unique translation direction. In the presence of outliers, the correct direction is the one supported by the largest consensus of great circles.

B. Voting Scheme on the Unit Sphere

Instead of sampling pairs of correspondences (which is computationally expensive), FLIGHT treats each correspondence as a "voter" casting votes for a range of directions.

Discretization (Fibonacci Lattice): The unit sphere is discretized into bins using a Fibonacci lattice. This ensures a near-uniform distribution of points with approximately equal area, avoiding the distortion issues of standard spherical coordinate grids.
Voting Mechanism:
- For each great circle (derived from a feature pair), the algorithm calculates the angular distance to every bin center.
- If the great circle intersects a bin, it casts a vote.
- Weighted Voting: The vote weight is proportional to the arc length of the intersection between the great circle and the bin. This ensures that directions where the great circle passes deeply through a bin contribute more than those grazing the edge.
Winning Bin: The bin with the highest accumulated vote weight is selected as the estimated heading.

C. Optimization Strategies

To ensure real-time performance, FLIGHT employs three key optimizations:

Hierarchical Discretization:
- Stage 1: A coarse search using a sparse Fibonacci lattice (1,000 bins) to identify a candidate region.
- Stage 2: A fine-grained search using a dense lattice (64,000 bins) only within the winning region of Stage 1. This reduces complexity from $O(nM)$ to a much more efficient operation.
Non-Linear Refinement (NLR):
- After identifying the winning bin, the algorithm refines the estimate by finding a vector $p$ that is "most orthogonal" to the normals of all great circles intersecting that bin.
- This is solved via an eigenvalue decomposition of the matrix $A = \sum n_i n_i^T$ , minimizing $\sum (n_i \cdot p)^2$ subject to $\|p\|=1$ .
Early Stopping (ES):
- The algorithm samples a small subset of features (e.g., 64 pairs) to generate an initial estimate.
- It iteratively adds more features. If the estimate stabilizes (converges) before processing all features, the algorithm stops, significantly reducing runtime.

3. Key Contributions

Novel Generalization of Hough Transform: A new approach to heading estimation that casts votes from great circles on $S^2$ using a Fibonacci lattice, avoiding the $O(n^2)$ or $O(n^3)$ complexity of traditional sampling methods.
Hierarchical & Efficient Architecture: The combination of sparse-to-dense lattice sampling, non-linear refinement, and early stopping achieves a time complexity of $O(nm)$ (where $n$ is correspondences and $m$ is bins), making it highly scalable.
Robustness: The method demonstrates superior performance in dynamic scenes with high outlier ratios (up to 80%) and noise, outperforming state-of-the-art baselines in both accuracy and speed.
SLAM Integration: The method successfully integrates into SLAM pipelines (specifically PySLAM), improving trajectory accuracy by correcting heading during pose initialization.

4. Experimental Results

The authors evaluated FLIGHT on three datasets: KITTI (outdoor driving), TUM RGB-D (indoor, low translation), and Sintel (synthetic, highly dynamic).

Accuracy vs. Efficiency (Pareto Frontier):
- On KITTI, FLIGHT achieved the highest Mean Average Accuracy (mAA) for translation error (e.g., 0.6193 mAA at 2° with optical flow) while being 75% to 92% faster than the next best robust methods (FOE, 2-Point).
- On Sintel (dynamic scenes), FLIGHT improved accuracy by 3% over the best baseline (FOE) while running **95 times faster**.
Robustness to Outliers:
- Synthetic tests showed that as outlier probability increased (up to 80%), FLIGHT's runtime remained stable, whereas other methods degraded significantly in speed or accuracy.
- FLIGHT maintained high accuracy (>0.9 mAA at 5°) even with rotation noise up to 0.15°.
SLAM Performance:
- Integrating FLIGHT into PySLAM reduced the Root Mean Square Error (RMSE) of the trajectory by 8% on KITTI and 2% on EuRoC stereo sequences, with minimal computational overhead.

5. Significance

The paper presents a significant advancement in geometric computer vision by solving the "speed-distance" and "robustness" trade-offs in camera heading estimation.

Real-Time Viability: By moving away from random sampling (RANSAC) toward a deterministic, lattice-based voting scheme, FLIGHT makes robust heading estimation feasible for real-time applications like autonomous driving and drone navigation.
Dynamic Scene Handling: The method's ability to handle high outlier ratios without computational explosion makes it ideal for complex, unstructured environments where moving objects are common.
Practical Application: The demonstrated improvement in SLAM pipelines suggests that FLIGHT can serve as a critical pre-processing or initialization module for broader 3D reconstruction and localization systems.