Merging Memory and Space: A State Space Neural Operator

Imagine you are trying to predict the weather, the flow of blood in an artery, or the movement of smoke in a room. These are all governed by complex mathematical rules called Partial Differential Equations (PDEs). Traditionally, solving these equations is like trying to calculate the path of every single raindrop in a storm—it takes massive supercomputers and a lot of time.

In recent years, scientists have tried using AI to act as a shortcut. Instead of calculating every drop, the AI learns the "rules of the game" so it can predict the future instantly. This paper introduces a new, super-efficient AI architect called SS-NO (State Space Neural Operator).

Here is the breakdown of how it works, using simple analogies.

1. The Problem: The "All-Seeing Eye" vs. The "Memory Lane"

To predict how a fluid moves, an AI needs to understand two things:

Space: How things interact with their neighbors (e.g., a hot spot heating up the air next to it).
Time: How the current state depends on the past (e.g., a wave moving forward because of how it moved a second ago).

Previous AI models had a choice:

The "All-Seeing Eye" (FNO): These models look at the entire map at once. They see every point simultaneously. This is great for accuracy but requires a massive amount of memory, like trying to hold a high-resolution photo of the whole world in your head. It gets slow and expensive very quickly.
The "Memory Lane" (SSMs): These models (like the famous "Mamba" AI) are great at remembering long sequences of text or time. They are efficient and compact, but they usually only look at one thing at a time, like reading a book page by page. They struggle to understand complex 2D or 3D spaces all at once.

2. The Solution: The "Smart Librarian" (SS-NO)

The authors of this paper built a hybrid: SS-NO. Think of it as a Smart Librarian who has two superpowers:

Adaptive Damping (The "Focus Filter"):
Imagine you are listening to a noisy room. Sometimes you need to hear the whole room (global view), and sometimes you just need to focus on the person talking right next to you (local view).
- Old models were stuck with a fixed volume.
- SS-NO has a "volume knob" it can turn on the fly. If the situation is chaotic (like a storm), it dampens the noise to focus on stability. If it's smooth, it opens up the view. This keeps the model from getting confused and crashing.
Learnable Frequency Modulation (The "Tuning Fork"):
Imagine a radio. Old models (like the Fourier Neural Operator) are like radios with fixed stations. They can only tune into specific, pre-set frequencies (like 101.1, 102.3). If the signal you need is at 101.15, they miss it.
- SS-NO is a radio that can tune itself. It learns exactly which frequencies are important for the specific problem it's solving. It doesn't just listen to the "standard" waves; it finds the hidden patterns unique to the data.

3. How It Works: The "Scanning" Strategy

To handle 2D space (like a square grid of weather data), SS-NO uses a clever scanning trick.

Instead of trying to look at the whole square at once (which is heavy), it scans the grid like a lawnmower.
It sweeps left-to-right, then right-to-left (to catch everything), then top-to-bottom, then bottom-to-top.
By doing this, it builds a complete picture of the space without needing a massive memory bank. It's like reading a book: you don't need to memorize the whole book to understand the story; you just need to remember the context of the previous page.

4. The Results: Fast, Cheap, and Accurate

The paper tested this "Smart Librarian" on some very difficult physics problems:

Burgers' Equation: Like predicting traffic jams.
Kuramoto–Sivashinsky: Like predicting chaotic, swirling smoke.
Navier-Stokes: The complex math behind airplane wings and ocean currents.

The Verdict:

Accuracy: SS-NO was often more accurate than the giants (like FNO) that use 100x more computer power.
Efficiency: It used significantly fewer parameters (less "brain cells").
Speed: It ran faster and didn't crash the computer's memory, even on high-resolution maps.

The Big Picture

Think of previous AI models as heavy tanks: powerful but slow and expensive to fuel.
SS-NO is a nimble sports car: it has a powerful engine (the math), but it's lightweight and aerodynamic. It proves that you don't need a supercomputer to solve complex physics problems; you just need the right kind of memory and the ability to tune your focus.

This is a big step forward for engineers and scientists who want to simulate climate change, design better airplanes, or model blood flow without needing a billion-dollar supercomputer.

1. Problem Statement

The paper addresses the challenge of learning solution operators for time-dependent Partial Differential Equations (PDEs). Traditional Neural Operators (NOs), such as the Fourier Neural Operator (FNO), excel at mapping input functions to output functions but face significant limitations:

Computational Cost: Global convolution kernels (like in FNO) scale poorly in higher dimensions ( $O(N^D)$ ), leading to high memory and compute costs.
Rigidity: Factorized variants (e.g., FFNO) improve efficiency by decomposing dimensions but often lack flexibility in adapting receptive fields or frequency modes to specific spatial regions or temporal dynamics.
Memory vs. Space: While Structured State Space Models (SSMs) like S4 and Mamba are excellent at modeling long-range temporal dependencies, their application to spatiotemporal operator learning has been underexplored. Existing methods often treat space and time separately or apply SSMs only to spatial dimensions without a unified framework.

The goal is to develop a compact, parameter-efficient architecture that captures long-range spatiotemporal dependencies, adapts to different dynamical regimes (e.g., chaotic vs. smooth), and maintains universality.

2. Methodology: State Space Neural Operator (SS-NO)

The authors propose SS-NO, a unified framework that extends Structured State Space Models (SSMs) to joint spatiotemporal modeling.

Core Architecture

Spatiotemporal Factorization: SS-NO applies SSMs sequentially across spatial dimensions (e.g., $x$ then $y$ ) and integrates a temporal memory module.
Bidirectional Spatial SSMs: To satisfy the "full field of view" requirement for universality (ensuring every output pixel depends on all input pixels), the model employs bidirectional scanning for spatial dimensions. It processes the input forward and backward (flipped) and sums the results. This overcomes the limitation of unidirectional SSMs which act as non-universal approximators for PDEs.
Temporal Memory: Following the MemNO framework, a non-Markovian temporal SSM layer (based on S4) is integrated to capture history dependencies, compensating for information loss due to coarse spatial discretization or missing physical parameters.

Key Mechanisms

Adaptive Damping: The model introduces learnable damping coefficients ( $\rho_k$ $ρ_{k}$ ) in the SSM kernel.
- Function: These coefficients control the decay rate of the impulse response, allowing the model to dynamically localize its receptive field.
- Benefit: It stabilizes learning in chaotic regimes and allows the model to interpolate between global kernels (like FNO, $\rho \approx 0$ ) and localized CNN-like kernels ( $\rho \gg 0$ ).
Learnable Frequency Modulation: Unlike FNO, which uses fixed Fourier bases, SS-NO learns the frequency parameters ( $\omega_k$ $ω_{k}$ ) of the convolution kernel.
- Function: The model performs data-driven spectral selection, adapting to the specific oscillatory patterns of the target PDE.
- Benefit: This enables efficient extraction of critical dynamics without being constrained by a fixed grid of frequencies.

Theoretical Foundation

The paper proves a Universality Theorem for convolutional Neural Operators. It establishes that any continuous operator can be approximated arbitrarily well by a convolutional NO if it possesses a "full field of view" (the iterated kernel is non-vanishing everywhere).

SS-NO satisfies this condition via its bidirectional spatial scanning.
Theoretically, the spatial-only variant of SS-NO subsumes the Factorized FNO (FFNO) as a special case (when damping is zero and frequencies are fixed to harmonics), but SS-NO adds the expressivity of adaptive parameters.

3. Key Contributions

Unified Architecture: The first framework to explicitly merge SSMs with Neural Operators for joint spatiotemporal PDE solving, generalizing both FNO (spatial) and S4 (temporal).
Theoretical Guarantees: A rigorous proof of universality for convolutional NOs based on the "full field of view" criterion, closing a gap in the theoretical understanding of factorized and localized architectures.
Novel Mechanisms: Introduction of adaptive damping and learnable frequencies within the operator learning context, providing a unified mechanism for stability control and spectral adaptation.
Parameter Efficiency: Demonstrates that high accuracy can be achieved with significantly fewer parameters than competing methods (FNO, FFNO, U-Net, Transformers) by leveraging linear scaling in dimension ( $O(N \cdot D)$ ) rather than exponential scaling ( $O(N^D)$ ).

4. Experimental Results

The authors evaluated SS-NO on a diverse suite of 1D and 2D benchmarks:

1D Benchmarks:
- Burgers' Equation: SS-NO achieved competitive accuracy across resolutions (32–512) with exceptional parameter efficiency.
- Kuramoto–Sivashinsky (KS) Equation: In highly chaotic regimes (low viscosity), SS-NO outperformed the best baseline (FFNO) by 22% to 58% in relative $L_2$ error. It maintained superior performance even at coarse resolutions where other methods struggled.
2D Benchmarks:
- Navier–Stokes (TorusLi, TorusVis, TorusVisForce): SS-NO achieved the lowest error rates across all datasets, including those with varying viscosities and time-dependent forcing. It outperformed 2D FNO (FNO2D) by 23.7% on the TorusLi dataset while using ~180x fewer parameters.
- Compressible Euler (CE-RM, GCE-RT): Despite the presence of shocks and complex interface dynamics, SS-NO achieved the lowest errors, demonstrating robustness in capturing dominant spatiotemporal dynamics.
Ablation Studies:
- Damping: Models with learnable damping significantly outperformed those with fixed damping, especially in low-capacity settings and chaotic regimes.
- Bidirectionality: Unidirectional spatial SSMs failed to capture global dependencies, confirming the necessity of bidirectional scanning for universality.
- Memory: Temporal memory modules were crucial for handling missing contextual information (e.g., unknown forcing or viscosity).
Efficiency: SS-NO demonstrated the best cost-accuracy trade-off, achieving the lowest inference latency (<0.85ms) and maintaining constant runtime scaling with resolution, unlike attention-based or global convolution models.

5. Significance and Future Work

Significance: SS-NO establishes that state-space modeling is a superior foundation for efficient and accurate neural operator learning. It bridges the gap between the expressivity of global operators and the efficiency of linear-time sequence models. The ability to learn adaptive damping and frequencies allows the model to self-regulate its complexity based on the difficulty of the PDE dynamics.
Limitations: The current formulation relies on sequential scanning, which introduces a directional inductive bias (breaking rotational equivariance). It also faces memory bottlenecks during autoregressive training for very long horizons.
Future Directions:
- Extending to unstructured meshes and irregular geometries via coordinate transformations (as tested on Spherical Shallow Water Equations and AirfRANS).
- Replacing the Linear Time-Invariant (LTI) SSM backbone with input-dependent backbones (e.g., Mamba) to dynamically modulate parameters based on local flow features.
- Developing geometric recurrence using covariant derivatives to preserve symmetries on curved manifolds.

In conclusion, SS-NO represents a principled, scalable, and practical advancement for data-driven modeling in engineering and physical sciences, offering a compelling alternative to both Transformer-based and Fourier-based operators.