Perceptive Hierarchical-Task MPC for Sequential Mobile Manipulation in Unstructured Semi-Static Environments

Imagine a robot that works in a busy warehouse. Its job is to pick up boxes from one shelf, drive them across the room, and drop them off at a packing station. Then, it has to go do it all over again for the next box, and the next, for hours on end.

This is what the paper calls "Sequential Mobile Manipulation." It's not just about moving once; it's about doing a long list of tasks one after another.

The Problem: The "Ghost" in the Machine

Most robots today are like drivers who only look at a paper map they printed out before they left home. They assume the world is frozen in time.

The Issue: In the real world, things move. A forklift might drop a pallet in the middle of the aisle. A coworker might move a chair. If the robot is still looking at its old paper map, it will drive straight into the new obstacle, thinking it's safe because the map says "empty space" there.
The Consequence: The robot gets stuck, crashes, or takes a very long, inefficient detour because it's trying to avoid "ghost obstacles" (objects that used to be there but aren't anymore) or fails to see "new obstacles."

The Solution: A Robot with "Situational Awareness"

The authors of this paper built a new system called Perceptive Hierarchical-Task MPC. Let's break that scary name down into a simple story:

1. The "Smart Map" (Perceptive Mapping)

Instead of a static paper map, this robot builds a living, breathing 3D map in its head.

The Analogy: Imagine you are walking through a room. You don't just memorize where the furniture is; you notice if a chair has been moved. If you see a chair in a new spot, you update your mental map. If you see a chair is gone, you erase it from your memory.
How the robot does it: It uses cameras to constantly scan the room. It uses a special "Bayesian" logic (a fancy way of saying "probability guessing") to decide: "Is that box still there, or did someone move it?" If the robot is unsure, it keeps the box in the map but marks it as "maybe moved." If it sees the box is definitely gone, it deletes it. This prevents the robot from being haunted by ghosts.

2. The "Strict Boss" (Hierarchical-Task MPC)

The robot has two main jobs: Drive (move the base) and Manipulate (move the arm). Sometimes these jobs conflict.

The Analogy: Imagine a waiter carrying a tray of drinks (the arm) while walking through a crowded party (the base).
- If the waiter focuses only on walking fast, they might spill the drinks.
- If they focus only on holding the tray, they might walk into a wall.
How the robot does it: The system acts like a strict boss who prioritizes tasks. It says, "First, don't crash into the wall. Second, don't spill the drinks. Third, get to the table." It solves these problems in order of importance, ensuring the robot is safe first, then efficient.

3. The "Defensive Driver" (CBF Safety)

This is the most clever part. The robot doesn't just try to avoid hitting things; it changes how it moves based on how close it is to danger.

The Analogy: Think of driving a car.
- Old Method (EDF): You see a car 10 meters ahead. You keep your speed constant until you are 1 meter away, then you slam on the brakes. This is jerky and risky.
- New Method (CBF): You see a car 10 meters ahead. You gently ease off the gas. As you get closer, you slow down more. You treat the space around the obstacle like a "safety bubble." The closer you get, the more cautious you become.
Why it matters: In the real world, cameras have delays (lag) and blind spots. By slowing down before it's too late, the robot gives itself a buffer to react if a new obstacle suddenly appears.

The Result: A Robot That Actually Works

The researchers tested this robot in a simulated warehouse and with a real robot arm on a moving base.

The Test: They moved boxes around while the robot was working.
The Outcome:
- Old Robots: Got confused, tried to drive through "ghost" boxes, or crashed into new ones because they were too fast.
- This Robot: Noticed the boxes moved, updated its map instantly, slowed down near the new obstacles, and successfully finished its long list of tasks without crashing.

In a Nutshell

This paper is about teaching robots to stop relying on old, static maps and start watching the world change in real-time. It gives them the ability to:

See changes (like moved boxes).
Forget things that are gone.
Prioritize safety over speed when things get messy.
Adapt their driving style to be extra cautious when they are unsure.

It's the difference between a robot that blindly follows a script and a robot that is actually aware of its environment.

Here is a detailed technical summary of the paper "Perceptive Hierarchical-Task MPC for Sequential Mobile Manipulation in Unstructured Semi-Static Environments."

1. Problem Statement

The paper addresses the challenge of sequential mobile manipulation in unstructured, semi-static environments.

The Context: Mobile robots (e.g., in warehousing or service) must execute long-horizon sequences of tasks (navigation + manipulation).
The Limitation of Existing Methods: Current state-of-the-art planners (including Hierarchical-Task Model Predictive Control, or HTMPC) typically assume a static environment and rely on precomputed, offline maps.
The Core Challenge: In real-world operations, environments undergo semi-static changes (objects are moved, removed, or introduced) between subtasks. These changes often occur outside the robot's immediate Field of View (FoV). Relying on outdated maps leads to:
- Suboptimal trajectories.
- Safety hazards (collisions with "phantom" obstacles or failure to detect new ones).
- Infeasible motion plans.
Goal: Develop a control framework that closes the loop between online perception and whole-body control to enable safe, reactive, and efficient sequential task execution without relying on prior maps or external infrastructure.

2. Methodology

The authors propose a Perceptive Hierarchical-Task Model Predictive Control (HTMPC) framework. The system integrates three main components:

A. Perception and Object-Aware Mapping

Instead of treating the environment as a static voxel grid, the system maintains a 3D object-level volumetric map using a Bayesian inference approach.

State Estimation: Combines wheel odometry with RGB-D frames (using modified ORB-SLAM3) to handle featureless regions and compensate for sensor latency.
Object Modeling: Each object $O_i$ $O_{i}$ is modeled as a submap containing:
- 3D position and heading.
- A Euclidean Distance Field (EDF) voxel grid.
- A state distribution $p(l_i, v_i)$ representing geometric change ( $l_i$ ) and consistency ( $v_i$ , the likelihood the object has not moved).
Change Detection: The system explicitly matches new observations against the object library.
- If an object's consistency score $E[v_i]$ drops below a threshold (indicating it has moved or been removed), the object is updated or removed from the map.
- This prevents "phantom obstacles" (ghosts of moved objects) and ensures the map reflects reality even when changes occur outside the immediate FoV.

B. Perceptive HTMPC Controller

The controller generates coordinated whole-body trajectories using a lexicographic optimization framework.

Task Hierarchy: It solves a sequence of Single-Task MPC problems (STMPC) from the first task to the last, enforcing strict task ordering while exploiting kinematic redundancy.
Safety Constraints (CBF): Unlike traditional methods that use static EDF constraints, this framework derives Control Barrier Function (CBF) constraints online from the updated 3D map.
- Formulation: The safety constraint is defined as $\dot{h}_j(x, u) > -\gamma h_j(x)$ .
- Advantage: This limits the rate at which the robot approaches a safety boundary. It forces the robot to slow down as it nears obstacles, providing a conservative safety buffer against perception delays and noise.

C. Handling Semi-Static Changes

Soft Constraints: To handle sudden map changes that might render the optimization infeasible, the system uses slack variables for safety constraints (penalized in the cost function), allowing temporary violations only when necessary to maintain feasibility.
Reactive Behavior: The CBF formulation ensures that if the map updates to show a new obstacle, the robot immediately adjusts its speed and trajectory to maintain safety.

3. Key Contributions

First Closed-Loop Perception-Action for Sequential Tasks: This is the first work to systematically close the loop between online object-level change detection and whole-body control for sequential mobile manipulation in semi-static environments.
Object-Aware Bayesian Mapping: Introduces a method to explicitly model object consistency and geometric changes, effectively removing phantom obstacles and updating the map without prior knowledge.
CBF-Based Safety in HTMPC: Adapts Control Barrier Functions for high-dimensional mobile manipulators within an HTMPC framework, deriving constraints online from volumetric maps rather than static offline maps.
Real-World Validation: Demonstrates the system on a real robot (UR10 arm on a Ridgeback base) in unstructured environments, proving it works without external localization infrastructure.

4. Experimental Results

The framework was validated through simulations and real-robot experiments involving sequential pick-and-place and navigation tasks.

Object-Aware vs. Voxel-Level Mapping:
- Baseline (Static/Phantom): When objects moved, the baseline failed to clear old voxels, creating "phantom obstacles" that forced the robot to take longer, inefficient paths.
- Proposed Method: Successfully detected moved objects, updated the map, and removed phantom obstacles, resulting in shorter, more efficient paths.
CBF vs. EDF Safety Constraints:
- Performance: Under partial observations and sensor noise, the CBF approach significantly outperformed standard EDF constraints.
- Behavior: CBF forced the robot to slow down significantly near obstacles while maintaining nominal speed when far away. This conservative behavior reduced collision rates in incomplete maps.
- Metrics: In 45 simulated trials, CBF achieved a higher collision-free rate than EDF with the same safety margin.
Reactive Navigation:
- In scenarios with sudden obstacle appearance (dynamic boxes), the system successfully avoided collisions with a minimum Distance-to-Collision (DTC) close to the theoretical bound (0.6m for the end-effector, 1.48m for the base).
- The system achieved an 87% collision-free rate in simple scenarios and 60% in highly cluttered, fast-moving scenarios.
Long-Term Sequential Tasks:
- The robot successfully completed circular navigation and manipulation tasks over multiple loops while boxes were rearranged between visits, demonstrating robust adaptivity.

5. Significance and Impact

Enabling Long-Term Autonomy: This work bridges a critical gap between short-term reactive control and long-term sequential planning. It allows robots to operate reliably in dynamic human environments (warehouses, homes) where the world changes unpredictably.
Safety and Efficiency: By combining object-level change detection with CBF-based safety, the system achieves a balance where it is safe enough to handle perception delays and efficient enough to complete tasks quickly.
Scalability: The framework relies solely on onboard sensing and computation, making it deployable in previously unknown environments without the need for pre-mapped infrastructure.
Future Potential: The modular nature of the framework allows for integration with language-based task planners or stack-of-tasks methods, paving the way for more complex, human-centric robotic applications.