Discovering Symbolic Differential Equations with Symmetry Invariants

Imagine you are a detective trying to solve a mystery. The mystery is how the universe works. You have a pile of clues (data) showing how things move, change, or react—like water flowing through sand, chemicals mixing, or waves crashing. Your goal is to write down the "rulebook" (a mathematical equation) that explains these behaviors perfectly.

This is the job of Symbolic Regression: using computers to find the secret math formulas hidden inside data.

The Problem: The "Infinite Library"

The problem is that the library of possible math formulas is infinite. It's like trying to find the one correct sentence in a library containing every possible combination of words in the English language.

If you just let the computer guess randomly, it might find a formula that fits your data perfectly but is actually nonsense (like x + y = z when the real rule is x² - y = z).
It might also find a formula that is so incredibly complicated and messy that no human can understand it.
Worse, it might find a formula that breaks the laws of physics (like creating energy out of nothing).

The Solution: The "Symmetry" Superpower

This paper introduces a clever trick called Symmetry Invariants.

Think of Symmetry like a rule that says, "If I turn this system around, or move it to a different spot, the rules shouldn't change."

Example: If you rotate a circle, it still looks like a circle. The rule describing a circle doesn't care about the angle; it only cares about the distance from the center.

The authors realized that instead of asking the computer to guess formulas using all the raw variables (like x, y, u, v), we should force it to guess formulas using Symmetry Invariants.

The Analogy: The Shape-Shifting Detective
Imagine you are trying to describe a spinning top.

The Old Way (Raw Variables): You try to describe the top by listing the exact position of every atom at every second. If the top spins, the positions change wildly. Your formula has to be incredibly complex to track every tiny shift.
The New Way (Symmetry Invariants): You realize the top is spinning. So, you stop looking at the spinning atoms and start looking at the distance from the center. No matter how much the top spins, the distance from the center stays the same. This "distance" is the Invariant.

By forcing the computer to only use these "unchanging distances" (invariants) to build its formula, you instantly:

Shrink the Library: You throw away millions of impossible formulas that don't respect the symmetry.
Guarantee Physics: The resulting formula must obey the laws of physics (like conservation of energy or rotation) because it's built from pieces that never change under those laws.
Find Simpler Answers: The formulas become shorter, cleaner, and easier for humans to read.

How It Works in Practice

The researchers took existing AI tools (like Sparse Regression and Genetic Programming) and gave them a new set of "building blocks."

Instead of giving the AI blocks labeled x, y, u, and v, they gave it blocks labeled "Distance from Center," "Rotation Speed," or "Total Energy."
The AI then builds the equation using only these special blocks.

The Results: A Supercharged Detective

They tested this on three different "mysteries":

Water Waves (Boussinesq Equation): The old AI failed to find the rule. The new AI found it instantly.
Water Flowing Through Sand (Darcy Flow): The old AI got lost in the complexity. The new AI found the simple, elegant rule.
Chemical Reactions (Reaction-Diffusion): Even when the data was noisy (dirty clues) or the symmetry wasn't perfect (the top was slightly wobbly), the new method was much more robust and accurate than the old ones.

Why This Matters

This isn't just about math; it's about trust.
In the past, AI might give you a "black box" answer that works but makes no sense. With this method, the AI gives you a white box answer. It gives you a formula that:

Is short and simple.
Respects the fundamental laws of nature.
Is easy for a human scientist to read and understand.

In a nutshell: This paper teaches AI to stop guessing randomly in the dark and start looking for the "unchanging truths" (invariants) that nature uses. It's like giving the detective a map that only shows the roads that actually exist, rather than letting them drive off a cliff.

Here is a detailed technical summary of the paper "Discovering Symbolic Differential Equations with Symmetry Invariants".

1. Problem Statement

The discovery of governing symbolic differential equations (PDEs) from observational data is a critical task for understanding complex physical systems. While Symbolic Regression (SR) methods (e.g., Sparse Regression/SINDy, Genetic Programming) have automated this process, they face two major challenges:

Vast Search Space: The combinatorial explosion of possible equation structures makes finding the correct equation computationally expensive and prone to overfitting.
Physical Invalidity: Discovered equations often violate fundamental physical laws, such as conservation laws or symmetries (e.g., rotational or translational invariance), leading to non-physical predictions.

Existing methods that incorporate physical constraints often lack generality, being restricted to specific equation types (e.g., ODEs only) or specific algorithms (e.g., only sparse regression).

2. Methodology

The authors propose a general framework that enforces symmetry invariance as an inductive bias in equation discovery. The core idea is to replace the original variables (independent variables, dependent variables, and their derivatives) with differential invariants of the symmetry group.

Core Theoretical Foundation

Differential Invariants: Based on Lie group theory, if a PDE admits a symmetry group $G$ , the equation can be expressed entirely in terms of the differential invariants of $G$ .
Theorem 4.2: Any differential equation admitting a symmetry group $G$ is equivalent to an equation of the form $\tilde{F}(\eta_1, \dots, \eta_k) = 0$ , where $\eta_i$ are the differential invariants.
Strategy: Instead of searching for equations in the space of raw variables $(x, u^{(n)})$ , the algorithm searches in the space of invariants $(\eta_1, \dots, \eta_k)$ . This guarantees that any discovered equation inherently satisfies the specified symmetry.

Algorithmic Implementation

The framework is designed to be agnostic to the underlying SR algorithm:

Constructing Invariants:
- Compute infinitesimal generators of the symmetry group.
- Solve the determining equations ( $v^{(n)}(\eta) = 0$ ) to find lower-order invariants.
- Use recursive formulas (Proposition 4.3) to generate higher-order invariants from lower-order ones.
- Practical Note: The authors filter and simplify these invariants to ensure numerical stability and interpretability (e.g., converting complex algebraic expressions into Laplacians for rotational symmetry).
Integration with SR Algorithms:
- General Explicit SR (e.g., Genetic Programming, Transformers): The set of input features is redefined to be the set of invariants. Since the Left-Hand Side (LHS) is unknown, the algorithm iterates through each invariant as a potential LHS, fits the model, and selects the one with the lowest error.
- Sparse Regression (SINDy):
  - Direct Approach: Use invariants as features and fit a linear combination.
  - Linear Constraint Approach (Proposition 4.4): To maintain the specific structure of SINDy (linear in coefficients), the authors derive a linear subspace constraint on the coefficient matrix $W$ . They compute a basis $Q$ for the symmetry-admissible subspace and parameterize $W = Q\beta$ . This reduces the parameter space dimension and allows for the use of sequential thresholding and Weak SINDy (for noisy data).
Handling Imperfect Symmetry:
- For real-world systems where symmetry is broken (e.g., by external forces), the authors introduce a Relaxed Symmetry Constraint.
- They decompose the parameter space into a symmetry-preserving subspace ( $Q$ ) and its orthogonal complement ( $P$ ).
- The model is parameterized as $W = A + B$ , where $A$ lies in the symmetry subspace and $B$ in the breaking subspace. A stronger regularization is applied to $B$ , allowing the model to learn symmetry-breaking terms only if the data strongly supports them.

3. Key Contributions

General Framework: A unified procedure to enforce symmetry in differential equation discovery using differential invariants, applicable to any SR algorithm (SINDy, GP, Neural Networks).
Algorithmic Adaptation: Specific implementations for Sparse Regression (including linear constraint derivation for SINDy and Weak SINDy) and Genetic Programming.
Robustness: Demonstration that the method works effectively even with noisy data and imperfect symmetry (via the relaxed constraint approach).
Efficiency: Significant reduction in the search space complexity, leading to higher success rates and faster convergence compared to baselines.

4. Experimental Results

The method was validated on three distinct physical systems: Boussinesq equation (scaling symmetry), Darcy flow (rotational symmetry), and Reaction-Diffusion systems (phase-space rotational symmetry).

Performance on Clean Data:
- Success Probability (SP): The proposed method (SI) achieved near-perfect success rates (1.00) on the Boussinesq equation, whereas standard SINDy failed (0.00) due to library limitations.
- Efficiency: In Genetic Programming, SI required significantly fewer iterations to converge to the correct equation compared to standard PySR.
- Complexity: The effective parameter space dimension was drastically reduced (e.g., from 38 to 28 in Reaction-Diffusion for SINDy).
Performance on Noisy Data:
- Using Weak SINDy, the SI method consistently outperformed the baseline WSINDy across various noise levels (up to 5%).
Performance on Imperfect Symmetry:
- In scenarios with broken symmetry (unequal diffusivities, external forcing), the SI-relaxed model maintained high success probabilities, whereas the strictly constrained SI model failed. Crucially, SI-relaxed still outperformed the baseline WSINDy, demonstrating that even approximate symmetry knowledge improves discovery.
Generalization:
- The method successfully extended to 3D systems with SO(3) symmetry, where standard methods struggled with the high-dimensional search space.

5. Significance

This work bridges the gap between Lie group theory and modern machine learning for scientific discovery.

Interpretability: By enforcing symmetry, the discovered equations are guaranteed to respect physical laws, making them more trustworthy and interpretable.
Data Efficiency: Reducing the hypothesis space allows for accurate discovery from smaller datasets and noisy observations.
Flexibility: Unlike previous symmetry-aware methods that were algorithm-specific, this framework is modular and can be plugged into existing SR pipelines (SINDy, GP, Transformers) with minimal modification.
Future Impact: It provides a robust pathway for discovering governing laws in complex systems where data is imperfect, paving the way for more reliable AI-driven scientific modeling.

Discovering Symbolic Differential Equations with Symmetry Invariants

The Problem: The "Infinite Library"

The Solution: The "Symmetry" Superpower

How It Works in Practice

The Results: A Supercharged Detective

Why This Matters

1. Problem Statement

2. Methodology

Core Theoretical Foundation

Algorithmic Implementation

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning

Missingness Bias Calibration in Feature Attribution Explanations

Why Is RLHF Alignment Shallow? A Gradient Analysis

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent Planning