Learning interpretable and stable dynamical models via mixed-integer Lyapunov-constrained optimization

Imagine you are trying to teach a robot how to drive a car. You have a video of a human driving, and you want the robot to learn the rules of the road just by watching.

Most modern AI methods are like a "black box" wizard. They look at the video, guess the rules, and say, "I think the car should turn left here." They might get the answer right 99% of the time, but they can't explain why, and sometimes, they make a dangerous mistake that looks fine in the video but causes a crash in real life.

This paper proposes a different way to teach the robot. Instead of a black box, they want a transparent, rule-following student who also learns a "safety manual" at the same time.

Here is the breakdown of their approach using simple analogies:

1. The Goal: Finding the "True Story"

The authors want to discover the mathematical "story" (the equations) that explains how a system moves. But they have two strict requirements:

It must be readable: The story shouldn't be a confusing mess of code; it should look like a clear sentence (e.g., "Speed equals acceleration times time").
It must be safe: The story must guarantee that the system eventually settles down and doesn't go crazy (like a pendulum eventually stopping, rather than spinning forever).

2. The Ingredients: LEGO Bricks

Instead of using a giant, complex neural network, the authors build their model using LEGO bricks (called "basis functions").

Imagine you have a box of bricks: some are straight lines, some are curves, some are squares, and some are sine waves.
The computer's job is to pick the right bricks and snap them together to build the equation that describes the system.
They also build a second structure at the same time: a "Safety Net" (called a Lyapunov function). This isn't part of the car's engine; it's a mathematical tool that proves the car will never crash.

3. The Challenge: The "Safety Net" Constraint

Usually, when you teach a computer, you just say, "Make your guess match the video as closely as possible."

The Old Way: The computer might guess a model that fits the video perfectly but is physically impossible (like a car that accelerates infinitely). It's accurate on the video but dangerous in reality.
The New Way (This Paper): The authors say, "Make your guess match the video, AND you must prove that your 'Safety Net' holds up."

They force the computer to solve a giant puzzle where:

The model must look like the data.
The model must be built only from the selected LEGO bricks (to keep it simple and readable).
The "Safety Net" must show that the system is always losing energy and moving toward a stop (stability).

4. The "Mixed-Integer" Puzzle

This is the hardest part. The computer has to make two types of decisions at once:

Continuous decisions: "How big should this brick be?" (e.g., 1.5 or 0.2).
Binary decisions: "Do we use this brick at all?" (Yes/No).

This turns the problem into a Mixed-Integer Quadratically Constrained Optimization problem.

Analogy: Imagine you are a chef trying to create a new recipe. You have to decide which ingredients to use (Yes/No) and how much of each to add (Continuous numbers), but you also have to prove mathematically that the dish won't explode in the oven.
The authors use a super-smart solver (Gurobi) to find the perfect combination of ingredients that satisfies all these rules.

5. The Results: Why It Matters

The authors tested this on two systems: a swinging pendulum (like a clock) and a coupled oscillator (two things vibrating together).

Without Noise: The method found the exact correct equations and the correct safety manual instantly.
With Noise (The Real World): Real data is messy (like a shaky camera).
- Old methods (Baselines): When the data was noisy, they got confused. Their models became wildly inaccurate, and their "safety nets" failed.
- The New Method: Because it was forced to respect the "Safety Net" rules during training, it ignored the noise better. It stayed accurate and kept the system stable, even when the data was messy.

The Bottom Line

This paper is like teaching a student to drive not just by showing them videos, but by forcing them to write down the traffic laws and prove that following those laws will keep them safe.

Even if the video is blurry (noisy data), the student who understands the rules of safety will drive much better than the student who just memorized the video frame-by-frame. The result is a model that is not only accurate but also trustworthy, simple to read, and mathematically guaranteed to be stable.

1. Problem Statement

The paper addresses the challenge of data-driven discovery of dynamical systems (governed by ordinary differential equations, $\dot{x} = f(x)$ ) that are both interpretable and guaranteed to be stable.

The Gap: Traditional data-driven methods (e.g., neural networks or unconstrained sparse regression) often minimize prediction error but fail to guarantee physical properties like stability. A model might fit training data perfectly but produce unstable trajectories outside the training set.
The Limitation of Existing Methods:
- Post-hoc analysis: Verifying stability after training is computationally expensive and does not prevent the learning of unstable models.
- Neural Networks: While they can approximate stable dynamics, they are "black boxes," making verification and interpretation difficult.
- Constrained Learning: Previous constrained approaches often rely on convex relaxations or specific functional forms that limit the search space or computational tractability.
Goal: To discover a dynamical model $f(x)$ and a corresponding Lyapunov function $V(x)$ simultaneously, ensuring the equilibrium point is asymptotically stable, while maintaining an interpretable symbolic form.

2. Methodology

The authors propose a Mixed-Integer Quadratically Constrained Programming (MIQCP) framework. The core idea is to parameterize both the system dynamics and the Lyapunov function using basis functions and enforce Lyapunov stability conditions as hard constraints during the optimization process.

A. Model Parameterization

Both the differential equations and the Lyapunov function are represented as linear combinations of a library of basis functions:

Dynamics: $\dot{x}_i \approx \sum_{k \in K_f} c_{ik} \phi_k(x)$
Lyapunov Function: $V(x) = \sum_{k \in K_v} v_k \phi^V_k(x)$
Sparsity Control: Binary variables ( $z_{ik}, z_k$ ) are introduced to select which basis functions are active. This allows the optimization to control model complexity (e.g., limiting the number of terms) and ensures the resulting model is interpretable (symbolic).

B. Lyapunov Constraints

The stability of the equilibrium ( $x=0$ ) is enforced via Lyapunov's direct method constraints on the training data trajectories:

Positive Definiteness: $V(x) > 0$ for all $x \neq 0$ and $V(0) = 0$ .
Negative Semi-Definite Derivative: $\dot{V}(x) = \nabla V(x)^\top f(x) \leq 0$ for all $x \neq 0$ .

These conditions are translated into algebraic constraints within the optimization problem. Crucially, the term $\dot{V}(x)$ involves the product of the Lyapunov coefficients ( $v_k$ ) and the dynamics coefficients ( $c_{ik}$ ), resulting in bilinear (non-convex) terms.

C. Optimization Formulation

The learning task is formulated as a global optimization problem:
$\min_{c, v, z} \underbrace{L_a}_{\text{Prediction Error}} + \omega_1 \underbrace{L^f_c}_{\text{Dynamics Complexity}} + \omega_2 \underbrace{L^V_c}_{\text{Lyapunov Complexity}}$

Objective: Minimize the prediction error ( $\ell_1$ norm of residuals) while penalizing the number of active basis functions (sparsity).
Constraints: The Lyapunov conditions (Eq. 12–15) and binary variable logic (Eq. 6, 9, 16–17).
Solver: The resulting non-convex MIQCP is solved to global optimality using state-of-the-art solvers (specifically Gurobi).

3. Key Contributions

Joint Discovery: A unified framework that simultaneously learns the dynamical model and a valid Lyapunov function, rather than learning the model first and verifying stability later.
Interpretability: By using basis functions and binary selection variables, the output is a symbolic, sparse equation (unlike black-box neural networks).
Global Optimality: The approach formulates the problem as an MIQCP, allowing solvers to find the global optimum (or a solution within a guaranteed gap), avoiding local minima common in gradient-based deep learning.
Robustness to Noise: The inclusion of stability constraints acts as a regularizer, preventing the model from overfitting to noise in a way that violates physical stability laws.

4. Experimental Results

The method was validated on two case studies:

Case Study 1: Damped Pendulum

Setup: A single trajectory was used to learn the system $\dot{x}_1 = x_2, \dot{x}_2 = -\sin(x_1) - x_2$ and its energy-based Lyapunov function.
Result: The algorithm successfully recovered the exact differential equations and the correct Lyapunov function from a single trajectory without noise.
Complexity Sensitivity: The study showed that if the allowed complexity for the Lyapunov function was too low (e.g., 1 basis function), the problem became infeasible, demonstrating the method's ability to detect when a valid stable model cannot be formed with the given constraints.

Case Study 2: Cross-Coupled Oscillator (Noisy Data)

Setup: A cubic oscillator system was tested under four levels of Gaussian noise ( $\sigma \in [0, 0.03, 0.05, 0.1]$ ).
Baselines: Compared against Stepwise Sparse Regression (SSR) and Mixed-Integer Optimization-based Sparse Regression (MIOSR) without Lyapunov constraints.
Performance:
- Vector Field Error: The proposed method (LyapSR) maintained significantly lower error (orders of magnitude better) as noise increased. While baseline errors grew by $10^2$ , LyapSR errors grew much more slowly.
- Coefficient Accuracy: Even with noise, LyapSR correctly identified the model structure and coefficients with errors in the order of $10^{-3}$ , whereas baselines drifted to $10^{-2}$ or failed to identify the correct basis functions entirely.
- Stability: The learned models were guaranteed to satisfy stability conditions on the training data, leading to better generalization.

5. Significance and Limitations

Significance:

The paper bridges the gap between data-driven modeling and control theory. It provides a rigorous way to embed physical stability guarantees directly into the learning process.
It demonstrates that interpretable models (symbolic equations) can be learned with high accuracy and stability guarantees, challenging the notion that deep learning is the only path for complex system identification.
The method is particularly valuable for safety-critical applications where stability is non-negotiable.

Limitations & Future Work:

Domain Validity: The stability is guaranteed only on the training data domain. The authors note that the returned Lyapunov function is not mathematically guaranteed to be valid over the entire continuous domain due to finite data and potential degeneracy (multiple solutions yielding the same loss).
Computational Cost: Solving non-convex MIQCPs is computationally intensive compared to convex regression, though modern solvers make it tractable for moderate-sized problems.
Mitigation: The authors suggest that if the candidate Lyapunov function is not globally valid, one can generate more data or use "integer cuts" to force the solver to explore alternative functional forms.

In conclusion, this work presents a robust, mathematically grounded approach to learning dynamical systems that are not only accurate but also inherently stable and interpretable, offering a superior alternative to unconstrained regression methods, especially in noisy environments.