WaveSSM: Multiscale State-Space Models for Non-stationary Signal Attention

Imagine you are trying to understand a long, complex story told by a friend. Sometimes, the story has a slow, steady rhythm (like a calm description of a landscape). Other times, it has sudden, loud outbursts or quick, sharp changes (like a car crash or a sudden laugh).

For a long time, AI models trying to understand these "stories" (which are just sequences of data like audio, heartbeats, or text) have been using a specific tool: State-Space Models (SSMs). Think of these models as a super-efficient note-taker who can remember a whole book without running out of paper.

However, the current version of this note-taker has a flaw. They use a method called Polynomial Bases.

The Problem: The "Global" Note-Taker

Imagine your note-taker tries to summarize a story by writing one giant, smooth sentence that covers the entire book from start to finish.

The Issue: If your friend suddenly screams "Fire!" in the middle of a calm story, this note-taker smears that scream across the whole sentence. They can't easily point to exactly when the scream happened or isolate it from the rest of the story. They treat the whole timeline as one big, blurry blob.
The Result: They are great at smooth, slow stories, but terrible at spotting sudden, sharp events (transients) or non-stationary signals (things that change their nature over time).

The Solution: WaveSSM (The "Zoom Lens" Note-Taker)

The authors of this paper, WaveSSM, decided to give the note-taker a new tool: Wavelets.

Think of Wavelets as a Zoom Lens or a Flashlight.

Instead of writing one giant sentence for the whole book, the note-taker can now shine a flashlight on a specific paragraph, then zoom in on a specific sentence, then zoom in on a specific word.
They can capture the "big picture" (the slow rhythm) and the "tiny details" (the sudden scream) without mixing them up.

How It Works (The Analogy)

Old Way (Polynomials): Imagine trying to describe a jagged mountain range using only smooth, rolling hills. You can get close, but you'll never capture the sharp peaks and deep valleys accurately. You need a lot of "smooth hills" to fake a sharp peak, and it's inefficient.
New Way (Wavelets): Imagine using a set of different-sized building blocks. You use big blocks for the flat plains and tiny, sharp blocks for the mountain peaks. You can build the exact shape of the landscape with far fewer blocks, and you know exactly where the peak is.

In technical terms, WaveSSM breaks the signal down into localized atoms.

Global Support (Old): One piece of data affects the whole memory.
Local Support (New): One piece of data only affects a small, specific part of the memory. This allows the AI to "pay attention" to exactly where the interesting stuff is happening.

Why Does This Matter?

The paper tested this new "Zoom Lens" note-taker on real-world problems where things change quickly and unpredictably:

Heartbeats (ECG): A heart rhythm is usually steady, but a heart attack looks like a sudden, sharp spike. WaveSSM spotted these spikes better than the old models, leading to more accurate medical diagnoses.
Voice Commands: When you say "Hey Siri," there's a sudden burst of sound. WaveSSM understood these sharp audio bursts better, even when the voice was recorded at different speeds.
Weather & Energy: Predicting the weather involves sudden storms. WaveSSM handled these sudden changes better than the previous "smooth hill" models.

The "Magic" Trick: Addressable Memory

The coolest part of WaveSSM is how it stores information.

Old Model: If you ask the model to remember two different events that happened at different times, it mashes them together into one big, confusing soup.
WaveSSM: Because it uses "local" blocks, it can store Event A in one corner of its memory and Event B in a completely different corner. It's like having a filing cabinet where every file has its own specific drawer. You can pull out just the "Fire" file without accidentally grabbing the "Calm Morning" file.

Summary

WaveSSM is like upgrading from a blurry, wide-angle lens to a high-definition camera with a zoom function. It allows AI to understand long sequences of data (like time series) much better, especially when that data has sudden, sharp, or changing moments. It's faster, more accurate, and much better at spotting the "needle in the haystack" than the models that came before it.

1. Problem Statement

State-space models (SSMs), such as S4 and Mamba, have emerged as efficient alternatives to Transformers for long-range sequence modeling. They rely on the HiPPO (High-Order Polynomial Projection Operator) framework, which projects input signal history onto a set of global orthogonal polynomial bases (e.g., Legendre, Laguerre).

The Core Limitation:
While effective for globally smooth signals, global polynomial bases suffer from poor inductive biases when dealing with non-stationary signals containing localized transient events (e.g., sharp spikes in ECG data, sudden audio onsets, or step functions).

Global Support: Each basis element spans the entire temporal domain. Consequently, a single state coefficient mixes information from the entire history, making it difficult to isolate specific transient events.
Addressability Issue: In HiPPO-based models, the final state is a linear superposition of all past inputs weighted by the same kernel. This makes it mathematically difficult to "address" or retrieve specific disjoint time windows independently without disentangling overlapping representations.
Approximation Inefficiency: Theoretical analysis shows that global polynomials approximate functions with jump discontinuities at a rate of $O(N^{-1/2})$ , whereas localized wavelets achieve $O(N^{-s})$ (where $s > 1/2$ ), offering significantly faster convergence for irregular signals.

2. Methodology: WaveSSM

The authors propose WaveSSM, a framework that constructs State-Space Models using wavelet frames instead of orthogonal polynomials. This leverages the SaFARi (State-Space Models for Frame-Agnostic Representation) framework to generalize SSM construction to arbitrary frames.

Key Technical Components:

Wavelet Frame Construction:
- Instead of global polynomials, WaveSSM uses a collection of time-localized atoms $\Phi = \{\phi_i\}$ derived from wavelet families (Morlet, Gaussian-derivative, Mexican Hat, Daubechies, and DPSS).
- These atoms are constructed via scaling and translation: $\phi_{k,m}(t) \propto \psi(\frac{t-\tau_m}{\sigma_k})$ .
- The latent state $h(t)$ represents the coefficients of the input signal projected onto these localized wavelet frames.
Deriving Dynamics via SaFARi:
- The continuous-time dynamics are derived by differentiating the projection of the input onto the wavelet frame.
- The system follows the form: $\dot{h}(t) = -\frac{1}{t} A_{sc} h(t) + \frac{1}{t} B_{sc} u(t)$ (for scaled measures) or similar forms for translated measures.
- The matrices $A$ and $B$ are computed numerically using the frame operator and its dual, ensuring the dynamics respect the wavelet structure.
Stability and Frame Tightening:
- Challenge: Wavelet frames are often redundant and non-orthogonal, leading to ill-conditioned frame operators ( $S = FF^*$ ), which can cause numerical instability in the derived $A$ matrix.
- Solution: The authors introduce Frame Tightening. They explicitly construct near-tight frames by whitening the row space: $F \leftarrow S^{-1/2}F$ . This ensures $FF^* \approx I$ , stabilizing the dual frame and preventing the amplification of noise or discretization errors.
Integration with S4 Architecture:
- To maintain computational efficiency ( $O(N)$ complexity), WaveSSM embeds the wavelet-initialized dynamics into the S4 architecture's Diagonal-Plus-Low-Rank (DPLR) parameterization.
- This avoids learning a full dense $N \times N$ matrix while retaining the inductive bias of the wavelet initialization.

3. Key Contributions

Wavelet-Induced SSMs: The first principled derivation of SSM dynamics from continuous and discrete wavelet frames, creating state coordinates that are temporally localized. Information from different time regions is stored in distinct subsets of the state.
Theoretical Analysis of Stability: A rigorous characterization of the numerical stability challenges introduced by wavelet dynamics and the proposal of "frame tightening" to ensure reliable long-horizon kernels.
Approximation Theory Proof: Demonstration that wavelet frames offer superior approximation rates for piecewise-smooth functions (with jumps) compared to global polynomial bases, explaining the empirical gains on transient data.
Addressable Memory Mechanism: Proof that WaveSSM can store and retrieve disjoint temporal windows independently, a capability lacking in standard HiPPO/SSM formulations due to kernel superposition.

4. Experimental Results

The authors evaluated WaveSSM on diverse benchmarks, consistently outperforming orthogonal-basis counterparts (S4-LegS, S4-FouT) on tasks involving transients.

Physiological Signals (PTB-XL ECG):
- WaveSSM achieved state-of-the-art AUROC scores on the PTB-XL dataset for multi-label/multi-class classification.
- Result: WaveSSM variants (particularly Daubechies and Mexican Hat) outperformed S4 and Mamba, with the highest overall score of 0.942 (vs. 0.934 for S4). Gains were most pronounced in detecting "Form" and "Diag" (diagnostic) features, which rely on localized waveform morphology.
Time Series Forecasting (Informer Benchmarks):
- Evaluated on ETTh, Weather, and ECL datasets.
- Result: WaveSSM variants consistently achieved lower Mean Squared Error (MSE) than S4 across most prediction horizons and datasets, demonstrating better handling of non-stationary trends.
Raw Audio Classification (Speech Commands SC35):
- Task: Classifying 35 spoken words from raw 16kHz audio.
- Result: WaveSSM achieved 96.55% accuracy (WaveSSMdpssT), surpassing S4-LegS (96.08%). It showed robustness in zero-shot resampling scenarios, though S4 remained slightly more robust to frequency shifts.
Long Range Arena (LRA):
- Result: WaveSSM outperformed S4 on 4 out of 6 tasks (ListOps, Text, Retrieval, Image).
- Limitation: WaveSSM struggled with extremely long-context tasks (PathX-128) where global dependencies are paramount, suggesting a trade-off between strong locality (good for transients) and global context retention.
Synthetic "Window Copying" Task:
- A custom task requiring the model to store and retrieve multiple disjoint time windows.
- Result: WaveSSM achieved an order-of-magnitude lower reconstruction error than S4, confirming its ability to maintain addressable, non-interfering memory for disjoint windows.

5. Significance and Impact

Bridging Signal Processing and Deep Learning: WaveSSM successfully integrates classical signal processing concepts (wavelet transforms) into modern deep learning architectures (SSMs), providing a theoretically grounded solution for non-stationary data.
Inductive Bias for Transients: It addresses a critical gap in current SSM research: the inability of global polynomial bases to efficiently model sharp, localized events. This is particularly vital for domains like biomedical engineering (ECG/EEG), seismology, and audio processing.
Efficient Attention Mechanism: By inducing "attention-like" behavior through the physical structure of the state space (localized atoms), WaveSSM offers a computationally efficient alternative to attention mechanisms for tasks requiring precise temporal localization without the quadratic cost of Transformers.

In summary, WaveSSM demonstrates that replacing global polynomial bases with localized wavelet frames in State-Space Models yields a more expressive, stable, and accurate architecture for modeling non-stationary signals with transient dynamics.

WaveSSM: Multiscale State-Space Models for Non-stationary Signal Attention

The Problem: The "Global" Note-Taker

The Solution: WaveSSM (The "Zoom Lens" Note-Taker)

How It Works (The Analogy)

Why Does This Matter?

The "Magic" Trick: Addressable Memory

Summary

1. Problem Statement

2. Methodology: WaveSSM

Key Technical Components:

3. Key Contributions

4. Experimental Results

5. Significance and Impact

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank