Data-Driven Forecasting of three-Component Seismograms… — Plain-Language Explanation

Original authors: Waleed Esmail, Stuart Russell, Jana Klinge, Alexander Kappes, Christine Thomas

Published 2026-06-03

📖 5 min read🧠 Deep dive

Original authors: Waleed Esmail, Stuart Russell, Jana Klinge, Alexander Kappes, Christine Thomas

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are listening to a complex piece of music, like a symphony, but you only get to hear the first few minutes. Your goal is to guess exactly how the rest of the song will sound, note for note, without ever hearing the actual recording.

This is essentially what the paper "Data-Driven Forecasting of three-Component Seismograms Using Transformer Architectures" attempts to do, but with earthquake waves instead of music. The researchers have built an AI named SeismoGPT that acts like a musical improviser who has studied millions of symphonies and can now predict the next few minutes of a song just by hearing the beginning.

Here is a breakdown of how it works and what they found, using simple analogies:

The Problem: The Earth is a Chaotic Orchestra

Predicting how earthquake waves travel through the Earth is incredibly hard. The Earth isn't a smooth, uniform ball; it's a messy, jumbled mix of rocks, layers, and cracks. When an earthquake happens, the waves bounce, scatter, and change speed, much like light shining through a kaleidoscope.

Traditionally, scientists try to predict these waves using supercomputers that run complex physics equations. But this is like trying to calculate the path of every single raindrop in a storm—it takes too much time and computing power to be useful for real-time warnings.

The Solution: SeismoGPT (The "Ear" that Learns Patterns)

Instead of trying to solve the physics equations from scratch every time, the researchers taught an AI to learn the patterns directly from data.

The Training: They didn't use real earthquake data (which is messy and noisy). Instead, they created a massive library of 3.9 million "fake" earthquakes using a computer simulation. They knew exactly how these waves should behave because they built the simulation themselves.
The Task: They showed the AI the beginning of a fake earthquake wave (starting when the first "P-wave" arrives and continuing past the "S-wave"). Then, they asked the AI to predict what the next 2 to 4 minutes of the wave would look like.
The Architecture: The AI is built on a "Transformer" architecture (the same type of brain behind advanced language models like the one you are talking to now). Instead of reading words, it reads chunks of seismic waves. It looks at the past to guess the future, one small piece at a time.

How Well Did It Work?

The results were surprisingly good, but with some specific rules:

The "Sweet Spot": When the earthquake was strong and not too far away, the AI was a master predictor. It got the timing and shape of the waves right about 93% to 97% of the time. It could accurately predict the "coda" (the long, fading tail of the earthquake) that causes the most damage to buildings.
The "Blurry" Zone: The AI struggled when the earthquake was weak (small magnitude) or very far away.
- Analogy: Imagine trying to hear a whisper from across a crowded, noisy stadium. The signal is too weak and gets distorted by the distance. In these cases, the AI's prediction started to "drift." It didn't make up crazy, impossible sounds; it just got the timing slightly wrong, like a musician who knows the melody but is a few beats off.
The "Context" Rule: The AI needs to hear a certain amount of the wave before it can predict the rest. The researchers found that the AI needed to hear at least one full "S-P interval" (the time gap between the first shake and the second, stronger shake) plus a little bit of the shaking that follows. If they cut the input short, the AI couldn't guess the future. If they gave it a bit more history, the predictions became much more stable.

The "Failure" Mode

When the AI failed, it didn't explode or create nonsense. It didn't predict a giant wave where there should be silence. Instead, it produced a wave that looked and sounded realistic but was out of sync with the real thing. It was like a singer who knows the song perfectly but starts singing a few seconds too late.

Why This Matters (According to the Paper)

The paper suggests this is a "proof of concept." It shows that AI can learn the "rules" of how earthquake waves move without needing to solve complex physics equations every time.

The authors specifically mention two potential uses for this technology:

Earthquake Early Warning: Since the AI can predict the damaging part of the wave (the surface waves) based on the early arrivals, it could help warn people faster.
Gravitational Wave Observatories: They mention the Einstein Telescope, a future observatory that listens for ripples in space-time. These observatories are sensitive to the tiny vibrations caused by local earthquakes (Newtonian noise). If the AI can predict these local vibrations, the observatory could "subtract" them out to hear the faint signals from space.

The Bottom Line

The researchers built a digital "seismologist" that learned to predict earthquake waves by studying millions of computer-generated examples. It works very well for strong, nearby quakes and gets a bit "out of tune" for weak, distant ones. It's a promising new tool that uses pattern recognition to do what supercomputers usually do with heavy math, potentially helping us predict seismic waves faster and more efficiently in the future.

Technical Summary: Data-Driven Forecasting of Three-Component Seismograms Using Transformer Architectures

Problem Statement
Accurate, real-time forecasting of seismic wavefields beyond observed data remains a significant challenge due to the non-linear, dispersive, and multi-scale nature of seismic wave propagation in heterogeneous media. Conventional numerical forward modeling (e.g., Finite-Difference or Spectral Element Methods) is computationally prohibitive for high-fidelity simulations at realistic frequencies, particularly when short periods, long propagation times, or large spatial domains are involved. While machine learning has advanced seismic event detection and phase picking, its application to continuous, autoregressive waveform forecasting—predicting the future evolution of a seismogram based on past observations—has been limited. This study addresses the feasibility of using data-driven sequence models to learn an implicit evolution operator for seismic waveforms without explicitly integrating the governing elastodynamic equations.

Methodology
The authors introduce SeismoGPT, a causal, autoregressive transformer architecture designed to forecast three-component (ZNE) seismic waveforms directly in the time domain. The approach treats forecasting as a physically constrained continuation problem.

Data Generation: To establish a controlled proof-of-concept, the study utilizes a synthetic dataset of approximately 3.9 million three-component seismograms. These are generated using Instaseis with the ak135f_2s Earth model and AxiSEM Green's functions. The dataset covers source depths of 5–100 km, epicentral distances of 10–90°, and moment magnitudes ( $M_w$ ) from 3 to 7. Source mechanisms are drawn from distributions fitted to the Global CMT catalogue.
Tokenization: To manage the computational complexity of long sequences, raw waveforms are partitioned into fixed-length "tokens" (patches) of 16 samples (approx. 8.4 seconds at the 1.9 Hz sampling rate). This reduces the sequence length while preserving local temporal structure.
Architecture: SeismoGPT employs an encoder-only transformer stack.
- Token Embedding: A 1×1 convolution mixes the three components, followed by mean and last-sample pooling to create fixed-dimensional embeddings.
- Backbone: A stack of 8 causal transformer layers with multi-head self-attention and Rotary Positional Embeddings (RoPE) models long-range temporal dependencies.
- Prediction Head: A two-layer feed-forward network maps token representations back to the waveform space.
Training Strategy: The model is trained using teacher forcing with a composite loss function comprising:
- Log-cosh loss: For time-domain fidelity and robustness to outliers.
- Multi-resolution STFT loss: To preserve spectral content across different frequency scales.
- Temporal delta loss: To ensure smooth transitions between token boundaries.
- Cross-horizon coherence loss: To maintain spectral consistency across multiple prediction horizons.
  The model is optimized with AdamW and trained on synthetic data with physics-preserving augmentations (polarity flips and channel swaps).

Evaluation Protocol
Forecasting performance is evaluated on a hold-out test set using three configurations defined by the context ratio ( $r$ ) and prediction horizon ( $\Delta t_{fut}$ ):

Context: The input window begins at the P-wave arrival and extends to $t_S + r \times (t_S - t_P)$ .
Horizon: The model predicts the subsequent waveform for 120 s or 240 s in fully autoregressive mode (no ground truth access).
Metrics: Performance is measured using Normalized Cross-Correlation (NCC) for phase/shape, Signal-to-Residual Ratio (SRR) for amplitude fidelity, and PSD log-L2 error for spectral accuracy.

Key Results

Overall Performance: SeismoGPT achieves a median NCC above 0.93 across all evaluation configurations. The horizontal components (N, E) generally perform slightly better than the vertical component (Z).
Context vs. Horizon:
- Doubling the prediction horizon from 120 s to 240 s (Configuration A to B) results in a modest degradation in performance (approx. 2% drop in median NCC), attributed to error accumulation in autoregressive rollouts.
- Doubling the context window from $1\times(t_S - t_P)$ to $2\times(t_S - t_P)$ (Configuration B to C) recovers much of this lost performance, demonstrating that observing at least one S–P interval of post-S waveform is necessary and approximately sufficient for stable forecasting.
Failure Modes: Performance degrades primarily in regimes characterized by large epicentral distances ( $\Delta \gtrsim 50^\circ$ ), low magnitudes ( $M_w \lesssim 4.5$ ), and shallow source depths. In these cases, wavefields are weakly coherent and highly dispersive. When failures occur, the model typically produces physically plausible waveforms that suffer from gradual phase drift rather than unphysical signal generation or amplitude divergence.
Representative Success: For median events, the model successfully predicts future arrivals for up to 600 seconds, preserving phase coherence and spectral energy distribution.

Significance and Claims
The paper claims that SeismoGPT demonstrates the potential of foundation-model approaches for physics-driven time-series forecasting. Key contributions include:

Demonstrating Feasibility: Showing that transformer-based sequence models can learn stable dynamical continuation of seismic wavefields directly from data, without explicit numerical integration of elastodynamic equations.
Controlled Baseline: Providing a rigorous, controlled proof-of-concept using synthetic data to isolate the effects of context length and prediction horizon, establishing a baseline before extending to real-world data.
Application Potential: Highlighting the method's potential utility in seismic early warning and hazard mitigation. Specifically, the authors note its relevance for next-generation gravitational-wave observatories like the Einstein Telescope (ET), where forecasting the short-term evolution of the ambient seismic wavefield could inform active mitigation of Newtonian noise.

The authors remain modest regarding immediate real-world deployment, noting that the current implementation uses a relatively small model (26M parameters) and synthetic data. They identify future work as necessary to address real-world complexities, including 3D Earth heterogeneity, high-frequency sampling rates, and pervasive environmental noise.

Data-Driven Forecasting of three-Component Seismograms Using Transformer Architectures