Multi-Robot Trajectory Planning via Constrained Bayesian Optimization and Local Cost Map Learning with STL-Based Conflict Resolution

Imagine you are the director of a busy airport, but instead of planes, you are managing a fleet of autonomous boats and robots. Your job is to get them all from Point A to Point B without them crashing into each other, hitting obstacles (like rocks or buildings), and while following a very strict set of rules written in a special "logic language."

This paper presents a new, smarter way to solve this traffic jam problem. Here is how it works, broken down into simple concepts:

1. The Problem: The "Too Many Rules" Nightmare

Usually, programming robots to move is like giving a child a list of 1,000 specific instructions: "Don't go left if there's a wall," "Don't go right if there's a tree," "Stop if you see a red light." This makes the robots slow, brittle, and prone to breaking when the real world gets messy.

The authors wanted to use Signal Temporal Logic (STL). Think of STL not as a list of rules, but as a story. Instead of saying "Stop at X," you tell the robot: "You must eventually reach the dock, but you must never, ever hit the fountain, and you must stay away from the other boat while you're doing it." It's a high-level story of what needs to happen, rather than a low-level list of steps.

2. The Solution: A Two-Stage "Brain and Bouncer" System

The authors built a two-part system to handle this.

Part A: The Smart Learner (cBOT) – The "Local GPS"

First, each robot needs to figure out how to move on its own without hitting walls.

The Old Way (RRT): Imagine a blindfolded person trying to find a door by randomly swinging their arms left and right until they bump into something. It works eventually, but it takes a long time and the path is jagged and messy.
The New Way (cBOT): This robot uses Bayesian Optimization. Imagine a smart explorer who carries a "magic map" (a Gaussian Process). Every time the robot tries a move, it updates its map. It learns where the "cost" (like bumping into things or taking a long way) is high and where it's low.
The Result: Instead of swinging randomly, the robot learns the terrain quickly. It finds a smooth, short, and safe path using very few tries. It's like switching from a blindfolded swing to a GPS that learns the road as you drive it.

Part B: The Conflict Resolver (STL-KCBS) – The "Traffic Controller"

Now, imagine 50 robots all using their "Smart Learner" at the same time. They might still crash into each other because they are all focused on their own paths.

The Old Way: A central computer tries to calculate the perfect path for all 50 robots at once. This is like trying to solve a 50-piece puzzle while the pieces are moving. It gets too slow and crashes the computer.
The New Way (STL-KCBS): This acts like a smart traffic controller.
1. It lets each robot plan its own path first (using the Smart Learner).
2. It then checks the "story" (the STL rules) to see if any two robots are going to be in the same place at the same time.
3. If they are about to crash, it doesn't throw everything away. It just tells the specific robots involved, "Hey, you two need to swap your timing or take a slightly different route."
4. It keeps doing this until everyone is safe.

3. The "Magic" of Robustness

The paper introduces a concept called Robustness.
Imagine you are walking through a crowded room.

Standard Planning: "I will walk exactly 1 meter away from that person." If they take one step closer, you crash.
Robust Planning (STL): "I will walk at least 2 meters away from that person." This gives you a safety buffer. Even if the person moves unexpectedly or your GPS is slightly off, you are still safe. The system calculates this "safety margin" mathematically to ensure the robots don't just technically avoid a crash, but avoid it comfortably.

4. Real-World Proof

The team didn't just run this on a computer; they tested it in the real world:

Indoors: They used small wheeled robots in a cluttered room with obstacles.
Outdoors: They took autonomous boats to a lake with fountains (obstacles) and made them navigate complex patterns, like crossing paths or swapping places.

The Results:

Speed: Their system solved problems with 50 robots in under a second, while other methods failed or took forever.
Smoothness: The paths were much smoother and shorter (less energy wasted).
Reliability: It worked 100% of the time in their tests, even in very crowded, messy environments where other methods gave up.

Summary Analogy

Think of the old methods as a chaotic dance where everyone tries to find their own steps by tripping over their feet until they get it right.

This new method is like a choreographed jazz band.

Each musician (robot) has a smart ear (cBOT) that instantly learns the best notes to play to avoid hitting the others.
The conductor (STL-KCBS) listens to the whole band. If two musicians are about to clash, the conductor gently nudges them to adjust their timing, ensuring the whole song (the mission) is played perfectly, safely, and beautifully, even if the room is small and crowded.

This paper proves that by combining smart learning with logical storytelling, we can get fleets of robots to work together safely and efficiently in the messy, unpredictable real world.

Here is a detailed technical summary of the paper "Multi-Robot Trajectory Planning via Constrained Bayesian Optimization and Local Cost Map Learning with STL-Based Conflict Resolution."

1. Problem Statement

The paper addresses the challenge of Multi-Robot Motion Planning (MRMP) under Signal Temporal Logic (STL) specifications and kinodynamic constraints.

Core Challenge: Generating dynamically feasible trajectories for a team of $N$ robots that simultaneously reach target regions, avoid obstacles (static and dynamic), and satisfy complex temporal/logical requirements (e.g., "always stay within bounds," "eventually reach goal," or adherence to COLREGs).
Limitations of Existing Methods:
- Exact approaches (e.g., MILP, MPC): Face scalability bottlenecks and struggle with complex formulas or long horizons.
- Sampling-based methods (e.g., RRT):* Often require excessive samples to find optimal trajectories and lack robustness in satisfying high-level STL specifications.
- Decentralized approaches: Often lack formal guarantees for specification satisfaction or struggle with inter-agent coordination in cluttered environments.

The goal is to develop a framework that is scalable, probabilistically complete, robust to perturbations, and capable of generating shorter, smoother trajectories than current state-of-the-art methods.

2. Methodology

The authors propose a two-stage framework that integrates sampling-based online learning with formal STL reasoning. The approach decouples single-robot planning from multi-robot coordination.

A. Single-Robot Level: Constrained Bayesian Optimization Tree Search (cBOT)

Instead of standard RRT, the authors introduce cBOT, which uses a Gaussian Process (GP) as a surrogate model to learn local cost maps and feasibility constraints.

Local Control Window: The planner operates within a local control neighborhood $V_d(x_t)$ to ensure computational tractability and fine-grained maneuvering.
Surrogate Modeling:
- Objective Function ( $J$ ): Modeled as a GP to learn costs (time, energy, smoothness).
- Constraints ( $c_k$ ): Modeled independently as GPs to learn obstacle avoidance and kinodynamic limits.
Acquisition Function: The search is guided by Constrained Expected Improvement (CEI), which balances performance improvement ( $EI$ $E I$ ) with the probability of constraint satisfaction ( $P_{feas}$ $P_{f e a s}$ ).
- $CEI(u) = EI(u) \cdot P_{feas}(u)$
Process: The algorithm iteratively selects controls, propagates states using a kinodynamic model (RK4 integration), and validates trajectories against STL specifications. This allows the planner to generate shorter, collision-free paths with fewer samples compared to RRT.

B. Multi-Robot Level: STL-Enhanced Kinodynamic Conflict-Based Search (STL-KCBS)

This layer handles coordination and conflict resolution using a decoupled approach based on the Kinodynamic Conflict-Based Search (K-CBS) framework.

STL Monitors: Unlike standard K-CBS which uses geometric intersection tests, STL-KCBS employs STL monitors for conflict detection.
Robustness-Based Conflict Detection: Conflicts are identified not just by spatial overlap, but by evaluating the STL robustness metric ( $\mu(t)$ ). A conflict exists if $\mu(t) < 0$ , indicating a violation of the safety specification (e.g., minimum separation distance) with a quantitative margin.
Resolution: When a conflict is detected, the algorithm generates STL-constrained trajectory refinements and propagates temporal constraints through the search tree.
Decoupling: The cBOT planner handles individual trajectory generation and static obstacle avoidance, while STL-KCBS manages inter-robot collision avoidance and temporal coordination.

3. Key Contributions

cBOT Algorithm: A novel constrained Bayesian optimization-based tree search that learns local cost maps and constraints via GPs. It generates shorter, smoother trajectories with fewer samples than conventional RRT-based approaches.
STL-KCBS Algorithm: An extension of K-CBS that integrates STL monitors into conflict detection and resolution. It ensures specification satisfaction while maintaining the scalability and probabilistic completeness of the K-CBS framework.
Comprehensive Benchmarking: A rigorous evaluation against nine baseline planners (including RRT*, GCS, and MPC variants) across diverse environments (open, bottleneck, forest, bugtrap) and team sizes (up to 50 robots).
Real-World Validation: Successful deployment on autonomous surface vehicles (ASVs) in a lake environment and ground robots in indoor settings, demonstrating robustness in uncertain, real-world conditions.

4. Experimental Results

The framework was tested on a desktop computer (Intel i7, 32GB RAM) using both simulation and real-world hardware (iRobot Create 3, Waveshare Rover, and ASVs).

Success Rate:
- STLcBOT achieved 100% success across all environments and team sizes (up to 50 robots).
- RRT-based methods (KRRT, STLRRT) degraded significantly in cluttered environments (Env. 3), failing beyond 12 robots.
- Convex Optimization methods (STGCS) failed completely in cluttered environments (Env. 3) and struggled with scalability beyond 4 robots.
Runtime Efficiency:
- STLcBOT maintained runtimes under 1 second for most scenarios, even with large teams.
- RRT-based planners showed exponential growth in runtime, exceeding 100 seconds for 50 robots.
Trajectory Quality:
- STLcBOT produced shorter and smoother trajectories compared to RRT.
- In the "cross-hall" environment, cBOT paths were significantly more organized and efficient.
- Path lengths for STLcBOT scaled sub-linearly, staying below 1200m even with 50 robots in complex maps, whereas RRT paths often exceeded 3000m.
Real-World Performance:
- Successfully executed complex maneuvers (X-pattern navigation, position swapping) with 2-3 ASVs in a lake, handling GPS noise and dynamic obstacles (fountains) in under one second of planning time.

5. Significance

This work bridges the gap between formal verification (STL) and efficient sampling-based planning for multi-robot systems.

Scalability: It solves a critical bottleneck in multi-robot planning, successfully handling teams of up to 50 robots in complex environments where existing exact STL planners fail (typically >6 robots).
Robustness: By using STL robustness metrics for conflict resolution, the system provides quantitative safety margins, making it suitable for real-world deployment where sensor noise and dynamics are unpredictable.
Practicality: The release of the STL-cBOT Planner as an open-source package and the validation on physical hardware (ASVs and UGVs) demonstrate immediate applicability in fields such as environmental monitoring, search and rescue, and autonomous logistics.

In summary, the proposed STLcBOT framework offers a superior balance of completeness, computational efficiency, and trajectory quality, outperforming both traditional sampling methods and convex optimization approaches in complex, constrained multi-robot scenarios.