Symbolic Discovery of Stochastic Differential Equations with Genetic Programming

Imagine you are a detective trying to figure out how a complex machine works, but you can only see the machine's output on a screen, and the screen is covered in static (noise).

Most scientists try to ignore the static, assuming the machine follows a perfect, predictable path. They try to draw a single, smooth line to explain the movement. But in the real world, things are messy. A stock market doesn't just go up or down; it jitters. A neuron in the brain doesn't just fire; it sparks randomly.

This paper introduces a new detective tool called GP-SDE (Genetic Programming for Stochastic Differential Equations). Here is how it works, explained simply:

1. The Problem: The "Noisy" Machine

Imagine you are watching a drunk person walking home.

The Drunk's Intent (Drift): They want to walk straight to their front door. This is the predictable part.
The Stumbles (Diffusion/Noise): But they are also tripping over cracks in the sidewalk, swaying in the wind, and bumping into people. This is the random, chaotic part.

Old methods tried to guess the path by ignoring the stumbles or treating them as a mistake. This paper says: "No! The stumbles are part of the story. We need to write a rule for how they stumble, not just where they are going."

2. The Tool: Genetic Programming (The "Evolutionary Chef")

The authors use a method called Genetic Programming. Think of this as a cooking competition where the chefs are computer programs.

The Ingredients: The computer has a library of math ingredients (plus, minus, multiply, divide, sine, cosine, etc.).
The Recipe: It randomly mixes these ingredients to create thousands of different "recipes" (mathematical equations) to describe the drunk person's walk.
The Taste Test (Fitness): It tests these recipes against the real video footage.
- If a recipe predicts the path perfectly, it gets a high score.
- If it fails, it gets a low score.
Evolution: The best recipes "mate" (swap parts of their code) and "mutate" (change a random ingredient) to create even better recipes for the next round. Over time, the computer evolves a perfect recipe that explains both the walking and the stumbling.

3. The Big Breakthrough: Cooking Two Dishes at Once

Usually, these computer chefs only try to write a recipe for the walking (the drift). They assume the stumbling is just random error they can't explain.

This paper's innovation is teaching the chef to cook two dishes simultaneously:

The Drift Dish: The rule for where the system wants to go.
The Diffusion Dish: The rule for how it stumbles and sways.

By learning both at the same time, the computer gets a much clearer picture of reality. It's like realizing that the drunk person isn't just "bad at walking," but that their stumbling follows a specific pattern based on how fast they are moving or how tired they are.

4. Why This is Better Than the Old Way

The old way of doing this (called Kramers-Moyal expansion) is like trying to sort a massive pile of mixed-up Lego bricks by dumping them into buckets based on size.

The Bucket Problem: If you have a simple 1D problem, the buckets work fine. But if you have a complex 20-dimensional system (like a weather model with 20 different variables), you need so many buckets that you run out of space, and the method crashes. It's slow and messy.

The new GP-SDE method doesn't use buckets. It builds the recipe directly.

Scalability: It handles complex, high-dimensional systems (like the 20-variable weather model) without getting overwhelmed.
Sparse Data: Even if you only have a few blurry snapshots of the drunk person (sparse data), this method can "fill in the gaps" by simulating the steps between the photos, making it very robust.

5. The Superpower: Generative Sampling

Because the new method learns the "stumbling rules" (the noise), it can do something the old methods can't: It can generate new, realistic scenarios.

Old Method: "Here is the average path the drunk person took." (One line).
New Method: "Here are 50 different possible paths the drunk person could take, all looking realistic with their own unique stumbles and sways."

This is crucial for scientists. If you are modeling a virus spread or a financial crash, you don't just want the average outcome; you want to see the range of possible disasters to prepare for them.

Summary

This paper gives scientists a smarter, more flexible way to decode the laws of nature in a noisy world. Instead of ignoring the chaos, they teach computers to evolve mathematical formulas that explain both the order and the chaos, allowing them to predict the future with much greater accuracy and creativity.

Here is a detailed technical summary of the paper "Symbolic Discovery of Stochastic Differential Equations with Genetic Programming."

1. Problem Statement

Automated Scientific Discovery (ASD) aims to derive interpretable mathematical models from data. While Symbolic Regression (SR) has been successful in discovering Ordinary Differential Equations (ODEs) and Partial Differential Equations (PDEs), its application to Stochastic Differential Equations (SDEs) remains limited.

The Gap: Existing SR methods for SDEs primarily rely on sparse regression combined with the Kramers-Moyal expansion. This approach requires data binning to estimate drift and diffusion coefficients separately.
Limitations of Current Methods:
- Two-stage process: Drift and diffusion are estimated independently, leading to potential inconsistencies.
- Curse of Dimensionality: Binning becomes computationally infeasible and inaccurate as system dimensionality increases.
- Data Sensitivity: Requires high-frequency sampling; performance degrades significantly with sparse or irregular data.
- Black-box alternatives: Neural SDEs exist but lack interpretability.
Goal: Develop a method to simultaneously discover the symbolic structure of both the drift (deterministic) and diffusion (stochastic) functions of SDEs using a single, unified optimization framework.

2. Methodology

The authors propose GP-SDE, a framework that utilizes Genetic Programming (GP) to evolve symbolic expressions for SDEs.

A. Mathematical Formulation

The target system is a time-invariant SDE:
$dx(t) = f(x(t))dt + G(x(t))dW$
Where $f$ is the drift and $G$ is the diffusion matrix. The authors assume independent noise for each variable, allowing the decomposition into scalar functions $f_i(x)$ and $g_i(x)$ . The framework also extends to Stochastic Partial Differential Equations (SPDEs).

B. Genetic Programming Framework

Representation: Individuals are represented as parse trees containing multiple sub-trees (one for drift, one for diffusion per variable).
Evolutionary Operators: Standard GP operators are used:
- Crossover: Swaps subtrees between individuals.
- Mutation: Modifies operators, leaves, or tree structure.
- Multi-tree Crossover: Specifically designed to exchange corresponding drift/diffusion trees between individuals.
Optimization Strategy:
- Objective Function: The algorithm minimizes the Negative Log-Likelihood (NLL) of a Gaussian distribution, derived from the Maximum Likelihood Estimate (MLE).
- Fitness Calculation: Instead of numerical integration of the ODE (which is expensive), the fitness is calculated based on the probability of state transitions. For a time step $\tau$ :
  $\mu_i(t_k) = x_i(t_{k-1}) + \tau \hat{f}_i(x(t_{k-1}))$
  $\sigma_i(t_k) = \sqrt{\tau} \hat{g}_i(x(t_{k-1}))$
  The fitness sums the log-likelihoods of these transitions.
- Handling Sparse Data (GP-SDE-MS): To address low sampling rates, the method allows for multi-step integration. The proposed equations are integrated numerically over $L$ sub-steps between observations to refine the mean and variance estimates before calculating the likelihood.
- Hyperparameter Tuning: Constants within the trees are optimized via gradient descent (using JAX) for the fittest individuals to accelerate convergence.
- Selection: NSGA-II is used to balance fitness (likelihood) and complexity (number of nodes).

3. Key Contributions

First GP-based SDE Discovery: Introduces the first method to use Genetic Programming for the simultaneous discovery of drift and diffusion symbolic structures in SDEs.
Unified Optimization: Unlike sparse regression methods that separate drift and diffusion estimation, GP-SDE optimizes them jointly via MLE, ensuring consistency between the deterministic and stochastic components.
Scalability: Demonstrates superior scalability to high-dimensional systems compared to Kramers-Moyal based methods, which fail due to the "curse of dimensionality" in binning.
Robustness to Sparse Data: The integration of multi-step numerical integration (GP-SDE-MS) allows accurate recovery of equations even when data is sampled sparsely.
Generalization to SPDEs: Successfully extends the methodology to discover Stochastic Partial Differential Equations.

4. Results

The method was evaluated on various benchmarks and compared against GP-ODE (learning only drift) and KM-SR (Kramers-Moyal + Sparse Regression).

Low-Dimensional Systems (Double Well, Van der Pol, Rössler):
- GP-SDE achieved competitive Mean Squared Error (MSE) with KM-SR.
- Crucially, GP-SDE outperformed GP-ODE, proving that modeling the diffusion term improves the recovery of the drift term.
- In the chaotic Rössler attractor, GP-SDE outperformed KM-SR in MSE because KM-SR's binning became inaccurate with chaotic trajectories.
High-Dimensional Systems (Lorenz96 with 5, 10, 20 variables):
- KM-SR failed completely at 10 and 20 dimensions due to the exponential growth of bins required for the Kramers-Moyal expansion.
- GP-SDE scaled effectively, maintaining low MSE and recovering correct structures where KM-SR could not.
Sparse Data (Lotka-Volterra):
- Standard methods failed as sampling intervals increased.
- GP-SDE-MS (with multi-step integration) significantly outperformed all baselines, recovering correct structures even at large time steps.
Generative Capability:
- Unlike GP-ODE (which produces a single deterministic trajectory), GP-SDE and KM-SR can generate ensembles of trajectories.
- GP-SDE generated samples with mean and standard deviation distributions that closely matched the ground truth, whereas KM-SR samples diverged significantly in extreme values.
SPDEs (Fisher-KPP, 2D Heat Equation):
- The framework successfully identified the correct symbolic structures for SPDEs, including gradient and Laplacian terms.
Computational Efficiency:
- While GP methods are slower than KM-SR on very low-dimensional problems, GP-SDE's runtime remains stable as dimensionality increases.
- KM-SR runtime exploded (or ran out of memory) as dimensionality increased, making GP-SDE the only viable option for high-dimensional stochastic systems.

5. Significance and Future Work

Scientific Impact: This work bridges the gap between interpretable machine learning and stochastic dynamics. It moves beyond "noise as a nuisance" to "noise as a learnable component," enabling better uncertainty quantification and generative modeling.
Automation: It provides a scalable, automated tool for discovering governing laws in noisy, dynamic, and high-dimensional systems (e.g., climate modeling, financial markets, biological systems).
Limitations & Future Directions:
- Assumptions: The current method assumes full observability and Gaussian noise. Future work aims to handle latent variables (using variational inference) and non-Gaussian noise (e.g., Lévy jumps).
- Identifiability: The authors acknowledge that distinct systems can produce similar statistics; a good fit does not guarantee the true equation was found.
- Real-world Application: The framework is a step toward applying symbolic discovery to real-world data where noise is complex and observations are incomplete.

In conclusion, GP-SDE represents a significant advancement in automated scientific discovery, offering a robust, scalable, and interpretable alternative to existing sparse regression methods for modeling stochastic dynamical systems.