Conformalized Data-Driven Reachability Analysis with PAC Guarantees

This paper introduces Conformalized Data-Driven Reachability (CDDR), a framework that leverages the Learn Then Test procedure to provide Probably Approximately Correct (PAC) coverage guarantees for reachable set over-approximations in linear and nonlinear systems using only independent and identically distributed calibration data, thereby overcoming the limitations of existing deterministic methods that require known noise bounds or specific system parameters.

Yanliang Huang, Zhen Zhang, Peng Xie, Zhuoqi Zeng, Amr Alanwar

Published Fri, 13 Ma
📖 5 min read🧠 Deep dive

Imagine you are driving a car through a dense fog. You can't see the road ahead, and you don't know exactly how bumpy the terrain is or how slippery the tires might get. Your goal is to draw a "safety bubble" around your car that guarantees you won't crash, no matter what the road throws at you.

This is the problem of Reachability Analysis. Engineers need to know all the possible places a system (like a robot, a self-driving car, or a power grid) could end up, so they can ensure it stays safe.

The paper you provided introduces a new method called CDDR (Conformalized Data-Driven Reachability). Here is how it works, explained through simple analogies.

The Old Way: Guessing the Worst Case

Previously, engineers tried to draw this safety bubble using two main approaches:

  1. The "Perfect Model" Approach: They tried to build a perfect mathematical map of the car and the road. But in the real world, we rarely have perfect maps.
  2. The "Max Guess" Approach: They looked at past data and said, "The worst bump we saw was 2 inches high, so let's assume the road will never be bumpier than 2 inches."

The Problem: The "Max Guess" approach is dangerous. Just because you haven't seen a 3-inch bump yet doesn't mean one won't happen tomorrow. If a giant bump appears, your safety bubble pops, and the system crashes. Existing methods often fail when the noise (bumps) is weird, heavy-tailed, or unknown.

The New Way: CDDR (The "Confident Coach")

The authors propose CDDR, which is like hiring a very smart, statistical coach who uses a technique called Learn Then Test (LTT).

Here is the step-by-step analogy:

1. The Training Camp (Learning)

First, the coach watches thousands of practice runs (data). They don't try to memorize the exact physics of the car. Instead, they just watch how the car actually behaves compared to where they thought it would go.

  • The Score: Every time the car drifts off the predicted path, the coach measures the "drift distance."

2. The Calibration Drill (Testing)

This is the magic part. The coach doesn't just look at the average drift. They set up a rigorous test to find a "Safety Threshold."

  • Imagine the coach says: "I need to be 95% sure that my safety bubble will catch the car 99% of the time in the future."
  • They run a statistical simulation (the LTT procedure) to find the exact size of the bubble needed to meet this promise.
  • The Guarantee: The paper calls this a PAC Guarantee (Probably Approximately Correct). In plain English: "If we run this experiment 100 times with different sets of practice data, 99 of those times, our safety bubble will definitely work."

3. Drawing the Bubble (The Result)

Once the threshold is set, the coach draws the safety bubble.

  • If the car is a standard sedan (Linear System), the bubble is a nice, neat box.
  • If the car is a weird, bouncy monster truck with non-standard physics (Non-Lipschitz/Nonlinear System), the bubble still works because the method doesn't care about the car's shape; it only cares about the data of how it moved.

Why is this a Big Deal?

1. It works when you know nothing about the noise.
Imagine the road is covered in "Student-t" noise. In math-speak, this means the road is usually smooth, but occasionally, a giant, unpredictable pothole appears. Old methods would guess the pothole size based on the biggest one they saw in the past. If a new, bigger pothole appears, they fail.
CDDR says: "We don't need to know the shape of the pothole. We just need enough practice runs to statistically guarantee our bubble is big enough to catch even the giant ones."

2. It handles "Measurement Noise" (The Foggy Windshield).
Sometimes, you can't see the car's exact position; you only see a blurry version of it (like looking through a dirty windshield).

  • Old methods would get confused and fail.
  • CDDR has a special trick: It expands the bubble to account for the blurriness, ensuring the real car is still inside, even if the seen car looks like it's outside.

3. It's efficient (The "Normalized Score").
Imagine the road is bumpy in the North-South direction but smooth in the East-West direction.

  • A simple method would make the safety bubble huge in both directions just to be safe, wasting space.
  • CDDR can use a "Normalized Score." It realizes, "Hey, the North-South bumps are huge, but East-West is tiny." It stretches the bubble in the right direction and shrinks it in the safe direction. This makes the safety zone much tighter and more useful without losing the guarantee.

The Bottom Line

Think of CDDR as a statistical safety net.

Instead of trying to predict the future perfectly (which is impossible), it uses past data to build a net that is mathematically guaranteed to catch the system, even if the system behaves in weird, unpredictable, or "heavy-tailed" ways. It trades a little bit of "tightness" (the net might be slightly larger than the absolute minimum) for a massive gain in reliability (you can be 99% sure the net won't break).

This is a game-changer for safety-critical systems like self-driving cars and medical robots, where you can't afford to guess wrong.