Conformalized Data-Driven Reachability Analysis with PAC Guarantees

Imagine you are driving a car through a dense fog. You can't see the road ahead, and you don't know exactly how bumpy the terrain is or how slippery the tires might get. Your goal is to draw a "safety bubble" around your car that guarantees you won't crash, no matter what the road throws at you.

This is the problem of Reachability Analysis. Engineers need to know all the possible places a system (like a robot, a self-driving car, or a power grid) could end up, so they can ensure it stays safe.

The paper you provided introduces a new method called CDDR (Conformalized Data-Driven Reachability). Here is how it works, explained through simple analogies.

The Old Way: Guessing the Worst Case

Previously, engineers tried to draw this safety bubble using two main approaches:

The "Perfect Model" Approach: They tried to build a perfect mathematical map of the car and the road. But in the real world, we rarely have perfect maps.
The "Max Guess" Approach: They looked at past data and said, "The worst bump we saw was 2 inches high, so let's assume the road will never be bumpier than 2 inches."

The Problem: The "Max Guess" approach is dangerous. Just because you haven't seen a 3-inch bump yet doesn't mean one won't happen tomorrow. If a giant bump appears, your safety bubble pops, and the system crashes. Existing methods often fail when the noise (bumps) is weird, heavy-tailed, or unknown.

The New Way: CDDR (The "Confident Coach")

The authors propose CDDR, which is like hiring a very smart, statistical coach who uses a technique called Learn Then Test (LTT).

Here is the step-by-step analogy:

1. The Training Camp (Learning)

First, the coach watches thousands of practice runs (data). They don't try to memorize the exact physics of the car. Instead, they just watch how the car actually behaves compared to where they thought it would go.

The Score: Every time the car drifts off the predicted path, the coach measures the "drift distance."

2. The Calibration Drill (Testing)

This is the magic part. The coach doesn't just look at the average drift. They set up a rigorous test to find a "Safety Threshold."

Imagine the coach says: "I need to be 95% sure that my safety bubble will catch the car 99% of the time in the future."
They run a statistical simulation (the LTT procedure) to find the exact size of the bubble needed to meet this promise.
The Guarantee: The paper calls this a PAC Guarantee (Probably Approximately Correct). In plain English: "If we run this experiment 100 times with different sets of practice data, 99 of those times, our safety bubble will definitely work."

3. Drawing the Bubble (The Result)

Once the threshold is set, the coach draws the safety bubble.

If the car is a standard sedan (Linear System), the bubble is a nice, neat box.
If the car is a weird, bouncy monster truck with non-standard physics (Non-Lipschitz/Nonlinear System), the bubble still works because the method doesn't care about the car's shape; it only cares about the data of how it moved.

Why is this a Big Deal?

1. It works when you know nothing about the noise.
Imagine the road is covered in "Student-t" noise. In math-speak, this means the road is usually smooth, but occasionally, a giant, unpredictable pothole appears. Old methods would guess the pothole size based on the biggest one they saw in the past. If a new, bigger pothole appears, they fail.
CDDR says: "We don't need to know the shape of the pothole. We just need enough practice runs to statistically guarantee our bubble is big enough to catch even the giant ones."

2. It handles "Measurement Noise" (The Foggy Windshield).
Sometimes, you can't see the car's exact position; you only see a blurry version of it (like looking through a dirty windshield).

Old methods would get confused and fail.
CDDR has a special trick: It expands the bubble to account for the blurriness, ensuring the real car is still inside, even if the seen car looks like it's outside.

3. It's efficient (The "Normalized Score").
Imagine the road is bumpy in the North-South direction but smooth in the East-West direction.

A simple method would make the safety bubble huge in both directions just to be safe, wasting space.
CDDR can use a "Normalized Score." It realizes, "Hey, the North-South bumps are huge, but East-West is tiny." It stretches the bubble in the right direction and shrinks it in the safe direction. This makes the safety zone much tighter and more useful without losing the guarantee.

The Bottom Line

Think of CDDR as a statistical safety net.

Instead of trying to predict the future perfectly (which is impossible), it uses past data to build a net that is mathematically guaranteed to catch the system, even if the system behaves in weird, unpredictable, or "heavy-tailed" ways. It trades a little bit of "tightness" (the net might be slightly larger than the absolute minimum) for a massive gain in reliability (you can be 99% sure the net won't break).

This is a game-changer for safety-critical systems like self-driving cars and medical robots, where you can't afford to guess wrong.

Here is a detailed technical summary of the paper "Conformalized Data-Driven Reachability Analysis with PAC Guarantees."

1. Problem Statement

Reachability analysis is critical for verifying the safety of cyber-physical systems by computing the set of all possible states a system can reach. While model-based approaches are standard, they require explicit system models which are often unavailable or expensive to derive.

Data-driven reachability attempts to bypass model identification by constructing over-approximations directly from noisy trajectory data. However, existing data-driven methods suffer from significant limitations:

Deterministic methods (e.g., [2]) require known noise bounds ( $Z_w$ ) or specific structural parameters (e.g., Lipschitz constants). In real-world scenarios, noise distributions are often unknown, and Lipschitz constants are unavailable for non-smooth or non-Lipschitz systems.
Existing Conformal Prediction (CP) methods typically provide marginal coverage guarantees conditioned on a single calibration set. This means the guarantee holds only for that specific split of data; different random splits could yield different coverage levels, offering no control over the variability of the guarantee itself.

The Core Challenge: How to compute data-driven reachable sets for systems with unknown noise distributions and potentially non-Lipschitz dynamics while providing a rigorous, finite-sample guarantee that the coverage holds with high probability, regardless of the specific calibration data split.

2. Methodology: CDDR Framework

The authors propose Conformalized Data-Driven Reachability (CDDR), a framework that utilizes the Learn Then Test (LTT) calibration procedure to achieve Probably Approximately Correct (PAC) guarantees.

A. System Settings

CDDR addresses three distinct system settings without requiring prior knowledge of noise bounds or system structure:

LTI Systems (No Measurement Noise): $x(k+1) = A x(k) + B u(k) + w(k)$ , where $w(k)$ has an arbitrary unknown distribution.
LTI Systems (With Measurement Noise): $y(k) = x(k) + v(k)$ , where $v(k)$ is bounded measurement noise (known zonotope $Z_v$ ) and process noise $w(k)$ is unknown.
General Nonlinear Systems: $x(k+1) = f(x(k), u(k)) + w(k)$ , where $f$ is unknown and potentially non-Lipschitz.

B. The CDDR Pipeline

The algorithm follows a three-phase process:

Model Fitting (Training):
- For LTI systems, a global linear model $\tilde{M}$ is fitted via least squares on the training set.
- For Nonlinear systems, a local affine model $M'_k$ is fitted at each time step $k$ around a nominal trajectory derived from training data.
Score Calculation & Calibration (LTT):
- Residual scores are computed on a calibration set ( $D_{cal}$ ). For LTI, the score is the $\ell_\infty$ norm of the prediction error: $s = \|x_{true} - \tilde{M}[x, u]\|_\infty$ .
- Instead of using a standard conformal quantile, CDDR employs LTT to select thresholds $\hat{q}_k$ . LTT scans candidate thresholds from conservative to aggressive, testing the null hypothesis that the population risk exceeds a target $\alpha$ .
- It uses the Hoeffding-Bentkus (HB) inequality to compute p-values, ensuring that the selected threshold satisfies the risk bound with probability $1-\delta$.
- Bonferroni Correction: To handle multi-step prediction ( $N$ steps), the error budget $\alpha$ and confidence $\delta$ are split ( $\alpha/N, \delta/N$ ) to guarantee joint coverage across all time steps.
Reachable Set Propagation:
- The reachable sets are represented as zonotopes for computational efficiency.
- The set is propagated using the fitted model and a conformalized error zonotope $Z^{CP}_k = \langle 0, \hat{q}_k I \rangle$ .
- For systems with measurement noise, the recursion explicitly accounts for the bounded measurement noise $Z_v$ via Minkowski sums to convert between observation and state spaces.

C. Key Theoretical Insight: Model-Guarantee Decoupling

A central contribution is the decoupling of model accuracy and statistical coverage.

Any model mismatch (due to linearization, non-smoothness, or unmodeled dynamics) is absorbed into the residual score.
The LTT calibration ensures the threshold $\hat{q}_k$ covers these residuals with the required probability.
Therefore, the PAC guarantee holds independently of the model class or accuracy. A more accurate model yields tighter sets, but the validity of the coverage guarantee remains intact even with a poor model.

3. Key Contributions

PAC Coverage Guarantee: Formalizes data-driven reachability as a calibration problem, providing an $(\alpha, \delta)$ -PAC guarantee. This ensures that with probability at least $1-\delta $, the reachable set will cover at least$ 1-\alpha$ of future trajectories, regardless of the random calibration split.
Broad Applicability: The framework works for LTI systems with unknown noise, systems with measurement noise, and general nonlinear (including non-Lipschitz) systems, removing the need for known Lipschitz constants or noise bounds.
Score Function Design: Introduces a normalized score function that exploits residual anisotropy. By normalizing residuals based on training data statistics, the method significantly reduces the volume of the reachable set (by orders of magnitude in anisotropic cases) without increasing the required calibration sample size.
Zonotope-Based Implementation: Utilizes zonotopes for efficient set propagation and exact Minkowski sum computations.

4. Experimental Results

The authors evaluated CDDR on two systems:

System 1: A 5-dimensional LTI system under Gaussian and heavy-tailed Student-t noise (unbounded).
System 2: A 2-dimensional nonlinear system with fractional damping (non-Lipschitz).

Baselines:

Empirical-max: Uses the maximum training residual as a bound (fails to provide formal guarantees).
Marginal CP: Standard conformal prediction with Bonferroni correction (provides marginal coverage but no PAC confidence).

Key Findings:

Coverage Validity: CDDR achieved 100% trajectory coverage with 0% failure rate across 1,000 random calibration splits, strictly adhering to the PAC bound ( $\delta=0.05$ ).
Comparison:
- Empirical-max failed under heavy-tailed noise (99.9% coverage) because outliers exceeded the training maximum.
- Marginal CP achieved 100% coverage but lacked the PAC guarantee (6.5% failure rate in validation).
Volume Efficiency:
- CDDR sets were slightly larger than Marginal CP due to the robustness required for the PAC guarantee.
- The Normalized Score reduced the reachable set volume by four orders of magnitude under anisotropic noise compared to the isotropic score, while maintaining the same calibration sample size requirements.
Non-Lipschitz Systems: CDDR successfully handled the non-Lipschitz fractional damping system where Lipschitz-based methods are inapplicable.

5. Significance

This work represents a significant advancement in safety-critical control and verification:

Robustness: It provides the first data-driven reachability framework that offers rigorous, finite-sample PAC guarantees without relying on restrictive assumptions like known noise bounds or Lipschitz continuity.
Practicality: It bridges the gap between theoretical safety guarantees and real-world data, where noise is often unknown and system dynamics may be complex or non-smooth.
Flexibility: The "model-guarantee decoupling" principle allows engineers to use simple or approximate models (e.g., linearizations) for reachability analysis without compromising the statistical validity of the safety certificate.
Future Impact: The framework opens the door for safe planning and control synthesis in environments where precise modeling is impossible, relying instead on data-driven statistical guarantees.