Turning Time Series into Algebraic Equations: Symbolic Machine Learning for Interpretable Modeling of Chaotic Time Series

Imagine you are trying to predict the weather, the spread of a virus, or the movement of ocean currents. These are chaotic systems. They are like a room full of thousands of bouncing balls; if you nudge one ball just a tiny bit differently, the entire pattern of bouncing changes completely. This makes them incredibly hard to predict.

For a long time, scientists have had two main ways to handle this:

The "Black Box" Approach: Using powerful AI (Deep Learning) that is great at guessing the next step but has no idea why it guessed that. It's like a magic 8-ball that gets the answer right but won't tell you the logic.
The "Equation" Approach: Trying to write down the exact math formula that governs the system. This is transparent and understandable, but it's incredibly hard to find the right formula when the system is messy and complex.

This paper introduces a new team of detectives called Symbolic Machine Learning. Their goal is to get the best of both worlds: the accuracy of the AI and the clarity of the math formula. They built two different "detectives" to solve the puzzle of chaotic time series.

The Two Detectives: SyNF and SyTF

The authors created two complementary tools to turn messy data into clean algebraic equations.

1. The Neural Architect (SyNF)

The Analogy: Imagine a master chef who is trying to recreate a complex dish. Instead of just guessing ingredients, this chef has a special kitchen where every tool (a knife, a whisk, a blender) is a specific mathematical operation (like "multiply," "sine," or "add").

How it works: The chef (the neural network) tastes the data and adjusts the "recipe" (the equation) by trying different combinations of these tools. Because the kitchen is designed so the chef can taste and adjust instantly (it's "differentiable"), they can learn the perfect recipe very quickly.
The Result: It produces a rich, complex recipe that might look like a long sentence of math, but it's fully written out. It's great for real-world data like disease outbreaks or ocean temperatures because it can handle messy, noisy information well.

2. The Evolutionary Gardener (SyTF)

The Analogy: Imagine a garden where you want to grow the perfect plant to predict the future. You start with thousands of random seeds (random math formulas).

How it works: You let them grow, then you cut off the weak ones (the ones that predict poorly) and keep the strongest. You take the best plants, cut them in half, and splice them together (crossover) to make new, potentially better plants. You also tweak them slightly (mutation). Over many generations, the "fittest" plant survives.
The Result: This method naturally evolves very simple, compact formulas. It's like a gardener pruning a bush until only the essential branches remain. It works beautifully on clean, simulated chaotic systems (like the classic "Lorenz attractor" used in physics).

The Great Experiment

The authors put these two detectives to the test in two ways:

The Simulation Gym (132 Chaotic Systems): They tested the tools on 132 different computer-generated chaotic systems.
- The Winner: The Evolutionary Gardener (SyTF) was the champion here. It found simple, elegant equations that predicted the future with high accuracy, beating even the most advanced AI models. It proved that you don't always need a giant black box; sometimes a simple, transparent formula is enough.
The Real-World Arena (Dengue Fever & El Niño): They tested the tools on real data: weekly dengue fever cases in San Juan and ocean temperature changes (El Niño).
- The Winner: The Neural Architect (SyNF) took the crown here. Real-world data is messy and noisy. The Gardener's simple formulas sometimes got confused by the noise, but the Architect's flexible, complex recipes could adapt to the messiness.
- The Special Touch: They even created a version of the Architect that could handle "division" (ratios), which is crucial for things like population growth or fluid dynamics. This version performed the best overall.

Why This Matters: The "Why" Behind the "What"

The biggest breakthrough isn't just that they predicted well; it's that they explained themselves.

Old AI: "I predict 500 dengue cases tomorrow." (You have to trust it blindly).
New Symbolic AI: "I predict 500 cases because the formula shows that when the temperature rises by X and the humidity is Y, the virus spreads at rate Z."

This is like getting a map instead of just being told the destination. In high-stakes fields like public health or climate science, knowing why a prediction is made is just as important as the prediction itself. It allows scientists to trust the model, spot errors, and understand the underlying physics of the world.

The Bottom Line

This paper shows that we don't have to choose between accuracy and understanding. By using these "Symbolic" methods, we can build AI that doesn't just guess the future, but writes down the rules of the game in plain English (or rather, plain Algebra). It turns the chaotic noise of the world into a readable story.

Here is a detailed technical summary of the paper "Turning Time Series into Algebraic Equations: Symbolic Machine Learning for Interpretable Modeling of Chaotic Time Series."

1. Problem Statement

Forecasting chaotic time series is a fundamental challenge in science and engineering due to the "butterfly effect," where small uncertainties in initial conditions amplify rapidly, and strong nonlinearities constrain predictability.

The Trade-off: Modern deep learning models (e.g., Transformers, LSTMs) often achieve high short-term accuracy but function as "black boxes," lacking the interpretability required for scientific insight and trust in high-stakes domains (e.g., epidemiology, climate science). Conversely, classical mechanistic models are interpretable but require strong prior assumptions and struggle with parameter estimation.
The Gap: While Symbolic Regression (SR) has been used to discover governing differential equations, its application to discrete time-series forecasting (specifically in chaotic regimes) remains underexplored. Existing SR methods often focus on recovering static functional relationships or differential equations rather than optimizing for predictive performance in rolling-window forecasting settings.

2. Methodology

The authors propose two complementary symbolic forecasting frameworks designed to learn explicit, interpretable algebraic equations directly from lagged observations.

A. Symbolic Neural Forecaster (SyNF)

Architecture: A neural-symbolic adaptation of the Equation Learner (EQL) framework. It replaces standard neural activation functions (ReLU, tanh) with a fixed set of symbolic operators (unary: identity, sine, cosine; binary: multiplication, and optionally division).
Mechanism: It uses a single hidden layer with end-to-end differentiable training via gradient descent (Adam optimizer). The network learns weights that combine these symbolic operators to form an explicit algebraic equation.
Variants:
- SyNF-Reg: Adds $\ell_1$ regularization to promote sparsity and remove redundant terms.
- SyNF-Div: Introduces learnable division units to capture rational dependencies (e.g., saturation dynamics). It employs a curriculum learning strategy with a stability threshold ( $\gamma$ ) and penalty terms to prevent division by zero and ensure numerical stability.
- SyNF-Div-Reg: Combines division operators with $\ell_1$ regularization.

B. Symbolic Tree Forecaster (SyTF)

Architecture: An evolutionary symbolic regression approach based on the PySR library.
Mechanism: It searches over a space of expression trees using an evolve-simplify-optimize loop:
- Evolve: Uses genetic programming (mutation, crossover, tournament selection) to generate candidate equations.
- Simplify: Applies algebraic identities to reduce expression complexity.
- Optimize: Fine-tunes real-valued constants using the BFGS algorithm.
Search Strategy: Incorporates simulated annealing to escape local optima and an adaptive "frecency" penalty to balance the search between simple and complex expressions. It also utilizes a Pareto front strategy to select models that offer the best trade-off between accuracy and complexity.
Variants:
- SyTF: Uses basic operators (+, -, *, sin, cos).
- SyTF-Div-Exp: Expands the operator set to include division and exponential functions.

3. Key Contributions

Novel Benchmark: The authors assembled a curated benchmark of 132 low-dimensional chaotic attractors (e.g., Lorenz, Rössler, Chua) and two real-world datasets (San Juan Dengue cases and El Niño 3.4 SST index) to rigorously evaluate symbolic forecasting.
Methodological Adaptation: Successfully adapted symbolic regression frameworks (neural-symbolic and evolutionary) specifically for one-step-ahead rolling-window forecasting, moving beyond their traditional use in differential equation discovery.
Comprehensive Evaluation: Compared the proposed methods against a broad suite of baselines, including:
- Classical statistical models.
- Tree ensembles (Random Forest, XGBoost, LightGBM).
- State-of-the-art deep learning (NLinear, N-HiTS, LSTM, Transformer, TiDE).
Uncertainty Quantification: Integrated conformal prediction to generate reliable prediction intervals for the symbolic models, addressing the need for risk assessment in high-stakes applications.

4. Experimental Results

Synthetic Chaotic Attractors (132 Systems)

Performance: The SyTF and SyTF-Div-Exp models consistently achieved the lowest median errors (RMSE, MAE, SMAPE) and the highest stability across the 132 chaotic systems.
Comparison: They outperformed all deep learning baselines (including Transformers and LSTMs) and tree ensembles.
Ablation: Increasing the number of lagged inputs (from 5 to 25) improved performance for all models, but SyTF maintained its dominance, demonstrating robustness to input dimensionality.
Significance: Statistical tests (Multiple Comparison with the Best) confirmed SyTF as the superior architecture for synthetic chaotic dynamics.

Real-World Datasets

San Juan Dengue (Epidemiology):
- The SyNF-Reg (regularized neural-symbolic) model achieved the best performance.
- Insight: The sparse symbolic connections effectively captured the nonlinear seasonal patterns and sudden outbreak dynamics better than deep learning models, which struggled with limited data and irregular patterns.
El Niño 3.4 SST (Climate):
- The SyNF-Div-Reg model outperformed all baselines.
- Insight: The inclusion of division operators allowed the model to reconstruct complex oscillatory behaviors and rational functional relationships inherent in sea-surface temperature dynamics.
Interpretability: The models produced explicit algebraic equations. SyTF yielded compact autoregressive-like structures, while SyNF produced richer expressions involving trigonometric and polynomial terms that revealed the underlying periodic and nonlinear mechanisms.

Uncertainty Quantification

The conformal prediction intervals generated by SyNF-Div-Reg successfully adapted to data volatility, expanding during chaotic periods and contracting during stable phases, providing reliable risk bounds for real-world forecasting.

5. Significance and Conclusion

This paper demonstrates that symbolic machine learning is a viable and often superior alternative to black-box deep learning for forecasting chaotic time series.

Scientific Impact: It bridges the gap between predictive accuracy and scientific interpretability, allowing researchers to extract governing algebraic relations directly from data without pre-defined mechanistic assumptions.
Practical Utility: The ability to provide transparent equations and reliable uncertainty intervals makes these models suitable for high-stakes decision-making in public health (dengue forecasting) and climate management (El Niño prediction).
Future Directions: The authors suggest extending this work to multi-step forecasting, incorporating multivariate exogenous drivers, and applying these techniques to high-frequency medical data (ECG, EEG) where interpretability is critical.

In summary, the study establishes that SyTF is the preferred method for synthetic chaotic systems due to its stability and compactness, while SyNF (particularly with division and regularization) excels in complex, noisy real-world scenarios by leveraging neural optimization to discover rich, interpretable dynamical laws.