Safety-critical Control Under Partial Observability: Reach-Avoid POMDP meets Belief Space Control

This paper proposes a real-time, layered control architecture for safety-critical partially observable systems that decouples goal reaching, information gathering, and safety into modular components using learnable Belief Control Lyapunov Functions and conformal prediction-based Belief Control Barrier Functions, enabling efficient quadratic programming solutions that outperform existing solvers in both simulation and space-robotics experiments.

Matti Vahs, Joris Verhagen, Jana Tumova

Published Thu, 12 Ma
📖 5 min read🧠 Deep dive

Imagine you are driving a car in a thick fog. You can't see the road ahead, you don't know exactly where you are, and you have a strict rule: You must reach a specific destination (the Goal) without ever hitting a wall (the Safety Zone).

This is the exact problem robots face in the real world. They have noisy sensors, imperfect maps, and they can't see everything. This paper proposes a new, smarter way to drive this "foggy car" safely and efficiently.

Here is the breakdown of their solution using simple analogies:

The Problem: The "All-in-One" Driver vs. The Fog

Traditionally, robot programmers tried to solve this by giving the robot a single "brain" that had to do three things at once:

  1. Drive to the goal.
  2. Avoid hitting walls.
  3. Figure out where it is (by moving around to get better sensor readings).

The authors argue that trying to do all three at the exact same speed is like asking a race car driver to also be a mechanic and a tour guide simultaneously. It's too much!

  • Safety needs to happen instantly (like slamming the brakes).
  • Finding the goal needs long-term planning (like plotting a route).
  • Gathering info happens at its own pace (like stopping to look at a map).

When you mix these conflicting speeds into one big calculation, the robot gets confused, moves too slowly, or makes dangerous mistakes.

The Solution: The "Layered Team"

The authors propose splitting the robot's brain into a layered team, where each member has a specific job and operates at their own speed. Think of it like a construction crew:

1. The Navigator (The Reference Controller)

  • Job: "Go to the green zone!"
  • How it works: This is the standard driver. It looks at the robot's best guess of where it is and points the car toward the goal. It doesn't worry about safety or fog; it just wants to get there.

2. The Detective (The BCLF - Belief Control Lyapunov Function)

  • Job: "Let's get a better look!"
  • The Analogy: Imagine the robot is in the fog. The Navigator wants to drive straight, but the Detective says, "Wait, if we drive this way, we might bump into a wall and learn exactly where we are."
  • How it works: This is the "Information Gathering" module. It uses a mathematical tool called a Lyapunov Function (think of it as a "happiness meter" for uncertainty). The robot moves in a way that lowers the "uncertainty meter." It learns that to reach the goal safely, it sometimes needs to take a detour to bump into a wall or look at a landmark to clear up the fog.
  • The Magic: They taught this "Detective" using Reinforcement Learning (trial and error). The robot learned that "bumping into a wall" is actually a good thing because it clears up the fog, allowing it to drive faster later.

3. The Safety Guard (The BCBF - Belief Control Barrier Function)

  • Job: "STOP! That's a cliff!"
  • The Analogy: This is the ultimate safety net. Even if the Navigator wants to drive fast and the Detective wants to explore, the Safety Guard has the final say.
  • How it works: It uses a tool called Conformal Prediction. Imagine the robot has 1,000 "ghosts" (particles) representing where it might be. The Safety Guard checks all 1,000 ghosts. If even a few of them are about to hit a wall, the Guard instantly tweaks the steering wheel to keep all of them safe. It doesn't just check "right now"; it guarantees safety for the entire trip ahead.

How They Work Together

The system works like a relay race with a referee:

  1. The Navigator says, "Drive North!"
  2. The Detective says, "Actually, let's drive North-East to bump into that wall so we know where we are."
  3. The Safety Guard checks: "If we drive North-East, will any of our 1,000 ghosts hit a wall?"
    • If Yes: The Guard tweaks the steering slightly to keep everyone safe, but still lets the robot move.
    • If No: The robot drives exactly as the Detective suggested.

Why This is a Big Deal

  • It's Fast: Instead of solving one giant, impossible math problem, they solve three small, easy problems. This allows the robot to make decisions in real-time, even with thousands of "ghosts" (particles) tracking its location.
  • It's Reusable: The "Detective" (the part that learns how to clear the fog) doesn't need to be retrained if the goal changes. If you move the green goal zone, you just tell the Navigator to go there; the Detective still knows how to clear the fog.
  • It Works in Real Life: They tested this on a real robot that floats on air cushions (simulating a space robot). The robot had to navigate a room by bumping into walls to find its way. The system worked perfectly, reaching the goal safely while the robot was essentially "blind" for most of the trip.

The Bottom Line

This paper teaches robots how to be smart about their own ignorance. Instead of panicking when they can't see, they have a structured plan:

  1. Detective: "Let's move to learn more."
  2. Navigator: "Let's move toward the goal."
  3. Safety Guard: "I'll make sure we don't crash while doing both."

By separating these jobs, the robot becomes faster, safer, and much better at navigating the unknown.