Temporal Logic Control of Nonlinear Stochastic Systems with Online Performance Optimization

Imagine you are teaching a robot to drive a car through a chaotic, foggy city. You have two main goals:

Safety: The robot must reach a specific destination without hitting any obstacles or driving off a cliff. This is non-negotiable.
Efficiency: You also want the robot to get there using the least amount of fuel and time possible.

The problem is that the city is unpredictable (it's "stochastic"). The wind might push the car, the tires might slip, and the robot can't see perfectly.

The Old Way: The Rigid Script

Traditionally, engineers would write a single, rigid "script" for the robot before it even starts moving. They would say, "At this exact spot, turn the wheel exactly 5 degrees."

The Good: This script is mathematically proven to be safe. If the robot follows it, it will reach the destination safely.
The Bad: This script is dumb. It doesn't care about fuel. It doesn't adapt if the wind changes slightly. It's like following a GPS that forces you to take a specific, winding backroad because it's "safe," even if a slightly different route would save you 10 minutes of gas. Once the script is written, you can't change it without risking the safety guarantee.

The New Way: The "Safe Zone" Strategy

This paper proposes a clever new method that combines the safety of the rigid script with the flexibility of a smart driver.

1. The Map Maker (Offline Abstraction)

First, the engineers create a simplified, "blocky" map of the city. Instead of every single street corner, they divide the city into large neighborhoods (like a grid).

The Innovation: In the old method, for each neighborhood, the map would say, "Turn the wheel exactly 5 degrees."
The New Method: The map says, "In this neighborhood, you can turn the wheel anywhere between 4 and 6 degrees."

This creates a "Safe Zone" of allowed actions for every location. The engineers mathematically prove that no matter which angle you pick within that 4-to-6-degree range, the robot will still reach the destination safely.

2. The Smart Driver (Online MPC)

Now, the robot is on the road. It has this "Safe Zone" map in its pocket.

Instead of blindly following a single pre-written angle, the robot uses a smart algorithm (called Model Predictive Control, or MPC) to look ahead.
It asks: "I am in Neighborhood A. My map says I can turn between 4 and 6 degrees. Which specific angle in that range will get me to the goal fastest and use the least fuel right now?"
It picks the best angle (say, 4.2 degrees) and drives.
A second later, it checks again, picks a new best angle within the allowed range, and continues.

The Analogy: The Hiking Trail

Think of it like hiking a mountain trail in thick fog.

The Old Way: You are given a single, narrow rope to follow. You must hold the rope exactly. If you let go, you might fall. But the rope is so narrow that you can't move left or right to avoid a puddle or a rock; you just have to step over it awkwardly.
The New Way: You are given a wide, fenced-in path. The fence represents the "Safe Zone." The engineers have proven that as long as you stay inside the fence, you will never fall off the cliff.
- Inside the fence, you are free to run, walk, or skip.
- You can choose the smoothest part of the path to save energy.
- You can dodge puddles.
- You are still 100% safe because of the fence, but you are much more efficient because you aren't forced to walk a single, rigid line.

Why This Matters

The paper shows that by using this "Safe Zone" approach:

Safety is Guaranteed: The robot never leaves the "fenced-in" path, so it still reaches the goal with a mathematically proven high probability (e.g., 99% chance).
Performance Improves: Because the robot can choose the best move within the safe zone, it saves a lot of energy (fuel) and time compared to the old rigid method.
It's Flexible: If the robot encounters a sudden gust of wind, it can adjust its steering within the safe zone to compensate, whereas the old method would just keep following the broken script.

In short, the authors figured out how to give robots a rulebook of "safe options" instead of a single "safe command," allowing them to be both safe and smart at the same time.

1. Problem Statement

The paper addresses the challenge of controlling nonlinear discrete-time stochastic systems in safety-critical environments. The goal is to synthesize a control policy that satisfies two competing objectives:

Formal Safety/Correctness: The system must satisfy a complex logical specification (specifically, infinite-horizon reach-avoid specifications) with a guaranteed minimum probability $\lambda$ .
Performance Optimization: The system must minimize a user-defined cost function $J$ (e.g., energy consumption, control effort, or time) online.

The Gap: Existing methods typically handle these objectives separately.

Abstraction-based methods (e.g., Interval Markov Decision Processes or IMDPs) provide formal guarantees for logical specifications but compute a single, static policy offline, leaving no room for online cost optimization.
Model Predictive Control (MPC) excels at online cost minimization but generally lacks formal guarantees for satisfying complex temporal logic specifications under stochastic, nonlinear dynamics.

The paper seeks to bridge this gap by computing a policy that guarantees the satisfaction probability $\ge \lambda$ while minimizing $J$ online.

2. Methodology

The proposed framework integrates offline formal abstraction with online Model Predictive Control (MPC). The approach is divided into two main phases:

A. Offline: Novel IMDP Abstraction with Set-Valued Actions

The authors propose a new abstraction technique that moves beyond standard IMDPs where an abstract action maps to a single control input.

Set-Valued Interface: Instead of mapping an abstract action $a$ to a single input $u$ , the method maps $a$ to a set of inputs (specifically, an $L_p$ -ball $B(u_i, \epsilon_i)$ in the input space).
Probabilistic Alternating Simulation Relation (PASR): They define a novel relation between the original stochastic system $S$ and the abstract IMDP $M$ . This relation ensures that for any policy $\sigma$ on the IMDP, there exists a set of refined policies $\pi$ for the original system.
Policy Set Generation: The abstraction yields a set of verified policies ( $\tilde{\Pi}$ ). Any policy $\pi$ selected from this set (where $\pi(x) \in F_{set}(x, \sigma(R(x)))$ ) is guaranteed to satisfy the specification with a probability bounded by the IMDP's worst-case satisfaction probability.
Result: This creates a "permissive" policy structure where the abstract policy defines a region of valid control inputs for every state, rather than a single point.

B. Online: Abstraction-Driven MPC

Once the set of valid inputs is determined offline, an online MPC controller operates within these constraints.

Constraint Enforcement: At each time step $k$ , the MPC solver identifies the current abstract state $s_k$ and retrieves the associated set of valid inputs $F_{set}(x_k, a_k)$ defined by the abstract policy.
Optimization: The MPC solves a finite-horizon optimization problem to minimize the cost function $J$ $J$ (e.g., tracking error + control effort) subject to:
1. The system dynamics (approximated via Piecewise Affine (PWA) models).
2. The constraint that the control input $u_k$ must lie within the set $F_{set}(x_k, a_k)$ .
Formulation: The logical constraints linking the state to the specific input set are encoded using binary variables, transforming the problem into a Mixed-Integer Quadratic Program (MIQP).
Guarantee Preservation: Crucially, because the MPC only selects inputs from the pre-verified set, the formal guarantee on the satisfaction probability ( $\ge \lambda$ ) is preserved, even though the specific input trajectory is optimized online.

3. Key Contributions

Theoretical Extension: The authors extend simulation relations for IMDPs to associate abstract actions with sets of control inputs (set-valued interfaces). This theoretically empowers abstractions to be compatible with online control, proving that the satisfaction probability of any refined policy is bounded by the abstract policy's worst-case probability.
Algorithmic Innovation: They develop a tailored MPC scheme that optimizes a cost function while strictly adhering to the constraints imposed by the set-valued abstraction. This is formulated as an MIQP that handles the non-convexity introduced by the nonlinear dynamics and logical constraints.
Empirical Validation: The framework is tested on three benchmarks: a double integrator, a mountain car, and a Dubins car. The results demonstrate that the approach significantly improves control performance (lower cost) compared to standard single-policy abstractions, with only a marginal reduction in the guaranteed satisfaction probability.

4. Experimental Results

The experiments evaluated two main questions: the effect of the input set size ( $\epsilon$ ) on the probability bound ( $\lambda$ ), and the ability to reduce cost $J$ .

Trade-off Analysis: Increasing the size of the input sets ( $\epsilon$ ) allows for greater online optimization freedom but decreases the certified lower bound $\lambda$ . The authors identified an "elbow point" in the trade-off curve where a small reduction in $\lambda$ yields a significant improvement in cost performance.
Performance Gains:
- Mountain Car: Achieved a 52.8% reduction in total cost with a negligible loss in $\lambda$ (0.45%). The control effort alone improved by 61.4%.
- Dubins Car: Achieved a 1.73% improvement in state error and a 9.7% improvement in control effort with a 0.47% reduction in $\lambda$ .
- Double Integrator: Showed consistent cost reductions (up to 11.6%) with tunable $\lambda$ loss.
Computation: The offline abstraction time was manageable (seconds to minutes), and the online MPC step time was low (sub-second), making the approach feasible for real-time implementation.

5. Significance

This work represents a significant step forward in formal methods for control.

Bridging the Gap: It successfully unites the rigorous safety guarantees of formal verification (via IMDPs) with the flexibility and optimality of modern control techniques (MPC).
Practical Applicability: By allowing online optimization within a certified safety envelope, the method enables the deployment of autonomous systems in complex, stochastic environments where both safety and efficiency are paramount.
Scalability: The use of set-valued abstractions avoids the "curse of dimensionality" often associated with trying to optimize over the entire state space while maintaining formal guarantees, offering a practical compromise between verification and performance.

In summary, the paper provides a robust framework where a system is guaranteed to meet complex logical requirements with high probability, while simultaneously optimizing for energy, time, or other performance metrics in real-time.