Hierarchical Multi-Modal Planning for Fixed-Altitude Sparse Target Search and Sampling

Imagine you are a treasure hunter diving in a murky, foggy ocean. Your mission is to find and photograph rare, scattered coral colonies. You have a limited battery, and the water is so cloudy that you can't see very far with your eyes.

This paper introduces a new "brain" for an underwater robot (called an AUV) named HIMoS. Think of HIMoS as a smart, two-layered navigation system that helps the robot find these hidden treasures efficiently without wasting energy.

Here is how it works, broken down into simple concepts:

1. The Problem: The "Lawnmower" vs. The "Smart Hunter"

Traditionally, robots search the ocean floor like a lawnmower: they go back and forth in straight lines, covering every inch of sand.

The Flaw: This wastes huge amounts of battery on empty sand where no coral exists.
The Old "Smart" Way: Some newer robots try to fly high to see the big picture, then dive low to check details. But in murky water, flying high is useless (you can't see anything), and diving up and down constantly drains the battery like a leaky faucet.

HIMoS's Solution: The robot stays at a fixed height (like a drone hovering steadily) but uses a mix of "superpowers" to see through the fog.

2. The Superpowers: A Three-Sensor Toolkit

Instead of just one camera, the robot carries three different tools, like a detective with a magnifying glass, a radar, and a flashlight:

The Sonar (The Radar): A "Forward-Looking Sonar" that sees through the murky water to map the type of ground (is it hard rock or soft sand?). It sees far but doesn't see details.
The Front Camera (The Scout): A "Front-Looking Camera" that sees medium distances to spot potential coral shapes.
The Down Camera (The Inspector): A "Down-Looking Camera" that zooms in right under the robot to take the final, high-quality photo of the coral.

3. The Two-Layer Brain: The General and The Scout

HIMoS splits the thinking process into two parts, working together like a General and a Scout on a battlefield.

Layer 1: The Strategic General (Global Planner)

The Job: The General looks at the big map. It doesn't worry about the next turn; it worries about the whole mission.
The Analogy: Imagine you are playing a game of "Battleship" on a huge grid. The General uses the Sonar data to guess where the "hard rock" islands are (because coral only lives on rocks, not sand). It draws a rough route connecting the most promising islands.
The Trick: It uses a "confidence meter." If an area is totally unknown, it wants to explore it. If an area looks like it has lots of coral, it wants to harvest it. It balances these two goals perfectly.

Layer 2: The Tactical Scout (Local Planner)

The Job: The Scout takes the General's rough route and figures out exactly how to drive the robot to get there without crashing.
The Analogy: The General says, "Go to that rocky island over there." The Scout says, "Okay, but I need to drive slightly left to scan a patch of sand first, then curve right to line up my down-camera perfectly with a coral patch."
The Magic: The paper introduces a clever math trick called "Differentiable Belief Dynamics."
- Normal Math: "If I turn left, I might see a coral. If I turn right, I might see sand." This is too messy for a computer to calculate quickly.
- HIMoS Math: It turns "might" into a smooth, continuous flow. It pretends the robot's knowledge grows smoothly like water filling a bucket. This allows the robot to calculate the perfect path instantly, knowing exactly where to look to learn the most new things.

4. The Loop: Sense, Think, Move, Repeat

The system works in a continuous loop:

Sense: The robot drives a few steps, using its sonar and cameras to update its map.
Think: The "Scout" recalculates the best path for the next few seconds based on new info.
Move: The robot executes that path.
Re-evaluate: Once the robot reaches a major "checkpoint" (a promising rock patch), the "General" wakes up, looks at the new map, and draws a new long-distance route to the next best spot.

Why is this a big deal?

In tests, this system was much better than existing methods.

The "With Prior" Test: Even when the robot was given a cheat sheet (a perfect map of where the coral was), HIMoS still performed almost as well as the cheat sheet. This is huge because it means the robot is learning and adapting in real-time, not just following a pre-written script.
Efficiency: It found more coral in less time because it stopped wasting energy on empty sand and stopped getting stuck in local loops (like the "MCTS" method mentioned in the paper, which got confused after a while).

The Bottom Line

HIMoS is like a highly efficient, fixed-altitude treasure hunter that never gets tired of the fog. It uses a mix of radar and cameras to build a mental map, splits its brain into a "big picture" planner and a "fine motor" driver, and uses advanced math to turn uncertainty into a smooth, winning path. It's a major step toward robots that can autonomously explore our oceans to protect fragile ecosystems.

Here is a detailed technical summary of the paper "Hierarchical Multi-Modal Planning for Fixed-Altitude Sparse Target Search and Sampling".

1. Problem Statement

The paper addresses the challenge of efficiently monitoring sparse benthic phenomena (e.g., coral colonies, invasive species) using Autonomous Underwater Vehicles (AUVs) in turbid marine environments.

Core Challenges:
- Sparsity: Biological targets are discrete and sparsely distributed, making traditional exhaustive "lawnmower" coverage strategies highly energy-inefficient.
- Environmental Constraints: Turbid water limits optical range, rendering high-altitude visual scouting unreliable.
- Energy Limitations: Existing adaptive sampling methods (like SASS) rely on frequent vertical maneuvers (oscillating between high and low altitudes) to switch between scouting and sampling, which is energy-intensive and physically demanding.
- Algorithmic Scalability: State-of-the-art planners often use grid-based discrete methods (e.g., Monte Carlo Tree Search) that struggle with high-resolution maps and fail to generate kinematically feasible trajectories.
Goal: Develop a framework that operates at a fixed altitude, fusing heterogeneous sensors (acoustic and visual) to maximize the number of sampled targets within a strict time budget while minimizing energy consumption.

2. Methodology: HIMoS Framework

The authors propose HIMoS (Hierarchical Informative Multi-Modal Search), a two-layer planning architecture that decouples strategic routing from tactical execution.

A. Sensor Suite & Observation Models

The AUV operates at a fixed altitude using three distinct sensors:

Forward-Looking Sonar (FLS): Provides broad-area acoustic mapping of the substrate (hard vs. sand) at long range.
Front-Looking Camera (FLC): Enables mid-range visual scouting for target detection.
Down-Looking Camera (DLC): Executes precise, close-range sampling.

Belief Dynamics: The system maintains probabilistic beliefs for substrate ( $B_S$ ) and coral presence ( $B_C$ ). It utilizes a conditional probability constraint: corals only exist on hard substrates ( $P(c|s=0)=0$ ). Beliefs are updated via Bayesian filtering using log-odds.

B. Hierarchical Planning Architecture

1. Strategic Level: Global Planner (Event-Triggered)

Objective: Determine the next high-probability habitat region to visit.
Approach: Formulated as a Budgeted Orienteering Problem (OP).
Adaptive Graph: Constructs a multi-resolution spatial graph. Regions with high uncertainty are kept as coarse "Macro" nodes; regions with sufficient acoustic data are subdivided into fine "Micro" nodes.
Reward Function: Uses an Upper Confidence Bound (UCB) utility derived from a Heteroscedastic Gaussian Process (GP). This balances exploitation (high predicted coral density) and exploration (high uncertainty), accounting for distance-dependent sensor noise.
Output: Selects a target node ( $v_{next}$ ) and allocates a local time budget ( $T_{local}$ ) for the tactical planner.

2. Tactical Level: Local Planner (Time-Triggered)

Objective: Generate a kinematically feasible trajectory to reach $v_{next}$ while maximizing information gain and sampling targets.
Core Innovation: Differentiable Belief Dynamics.
- Standard stochastic sensor models are non-differentiable and intractable for gradient-based optimization.
- HIMoS introduces a deterministic surrogate model that tracks the expected accumulation of observation evidence (log-odds) as a continuous process.
- This allows the formulation of the planning problem as a smooth Non-Linear Programming (NLP) problem.
Optimization: The planner minimizes a unified cost function ( $J$ $J$ ) comprising:
- $J_{scout}$ : Maximizing entropy reduction (exploration) for both substrate and target maps.
- $J_{samp}$ : Maximizing the probability of successful DLC sampling on high-confidence candidates.
- $J_{reg}$ : Enforcing kinematic constraints and trajectory smoothness.
Execution: The planner solves for a finite horizon, executes the first few steps ( $N_{exec}$ ), updates beliefs with new measurements, and replans (receding horizon).

3. Key Contributions

Fixed-Altitude Multi-Modal Framework: A novel system that eliminates energy-intensive vertical maneuvers by seamlessly fusing long-range acoustics (FLS), mid-range vision (FLC), and close-range sampling (DLC).
Hierarchical Architecture: A robust coupling of a strategic Orienteering Problem solver (handling long-horizon routing) and a tactical gradient-based optimizer (handling agile local maneuvers).
Differentiable Belief Dynamics: A mathematical formulation that converts stochastic sensor updates into a continuous, differentiable process. This enables the use of efficient gradient-based NLP solvers for informative path planning, generating smooth, non-myopic trajectories.
Adaptive Multi-Resolution Graph: A dynamic graph structure that refines resolution based on sensor data accumulation, balancing computational tractability with planning fidelity.

4. Experimental Results

The system was evaluated in high-fidelity simulations using real-world coral reef datasets (UAV photogrammetry).

Baselines: Compared against:
- Boustrophedon: Standard exhaustive coverage.
- MCTS: State-of-the-art non-myopic planning (adapted for fixed altitude).
- With Prior: An upper-bound baseline assuming perfect knowledge of the substrate map.
Performance:
- Efficiency: HIMoS consistently outperformed MCTS and Boustrophedon across "easy," "medium," and "hard" environments.
- Superiority to Prior: Remarkably, HIMoS achieved a higher target confirmation rate (91%) than the "With Prior" baseline (which had ground truth but used offline planning). This highlights the limitation of offline planning in dynamic, high-dimensional spaces and the advantage of HIMoS's online, adaptive refinement.
- Robustness: While MCTS performance plateaued after 1500s due to local entrapment, HIMoS maintained a steady discovery rate by dynamically bridging unvisited habitats.
Real-Time Feasibility:
- The Local Planner (NLP solver) averaged 0.5s computation time on an embedded Jetson AGX Orin.
- The Global Planner (OP solver) finished 95% of calls within 1.5s, triggered only when reaching a target region.

5. Significance

Operational Viability: By removing vertical maneuvers, HIMoS significantly extends AUV mission endurance, a critical factor for deep or long-duration ocean monitoring.
Algorithmic Advancement: The introduction of Differentiable Belief Dynamics provides a generalizable solution for gradient-based planning in stochastic environments, overcoming the "curse of dimensionality" and non-differentiability issues common in discrete planners.
Sim-to-Real Readiness: The system was tested on embedded hardware with sensor models calibrated to real-world turbid conditions, demonstrating strong potential for immediate deployment in real-world oceanographic missions.

In summary, HIMoS represents a significant leap forward in autonomous underwater exploration, offering a computationally efficient, energy-saving, and highly effective strategy for finding and sampling sparse biological targets in challenging marine environments.