Probing forced responses and causality in data-driven… — Plain-Language Explanation

Imagine the Earth's climate as a giant, chaotic orchestra. It has thousands of instruments playing at once, from the deep, slow rumble of ocean currents to the rapid, high-pitched chirping of daily weather. For decades, scientists have tried to build a "digital twin" of this orchestra using artificial intelligence (AI) to predict how it will sound in the future.

This paper, written by Fabrizio Falasca, asks a critical question: Just because an AI can perfectly mimic the orchestra's current sound, does it actually understand how the music will change if we suddenly change the conductor's tempo?

Here is a breakdown of the paper's findings using simple analogies.

1. The Problem: The "Perfect Mimic" vs. The "True Understanding"

Current AI climate models are like incredibly talented parrots. If you play them a recording of the climate, they can repeat the sounds (the statistics) almost perfectly. They can tell you what the average temperature is or how much rain usually falls.

However, the paper argues that these "parrots" often fail when you ask them a "what if" question. If you tell the AI, "What happens if the ocean gets warmer in a specific pattern?" the AI might guess the wrong answer. It mimics the past but doesn't understand the causes. In scientific terms, it captures "stationary statistics" (the average state) but fails at "forced responses" (how the system reacts to change).

2. The Test: The Three-String Instrument

To prove this, the authors didn't start with the massive, complex Earth. Instead, they built a tiny, simplified "instrument" with just three strings (variables) that mimics the physics of the real climate.

The Setup: They let this instrument play for a very long time so the AI could learn its song.
The Test: They then gave the instrument a tiny "tap" (a perturbation) and asked the AI to predict how the sound would change.

The Results:

The Linear Model (The Simple AI): This model was like a basic metronome. It could predict the average rhythm well, but if you tapped the instrument, it couldn't predict how the loudness (variance) would change. It was too rigid.
The Neural Model (The Smart AI): This model was much better. It could predict both the rhythm and the changes in loudness. It learned the "rules" of the instrument well enough to handle the tap.

The Catch: This success only happened because the AI had access to all three strings. It saw the whole instrument.

3. The Real-World Problem: The "Blind" Musician

In the real world, we are like blind musicians. We cannot see the entire climate system. We only see a few "strings" (like surface temperature) while the rest of the orchestra (deep ocean currents, tiny atmospheric swirls) is hidden from us.

The paper shows that when the AI only sees one string:

It can still learn to mimic the sound of that one string.
But, it often fails to predict how that string will react to a tap.

Why? Because the hidden strings are pushing and pulling the one we can see. If the AI doesn't know those hidden strings exist, it tries to explain the movement using only the visible string, leading to wrong predictions about cause and effect.

To fix this, the authors suggest two things:

Choose the right string: You must pick the "slow" string (the one that matters most) rather than a fast, noisy one.
Add "Ghost Noise": Since the AI can't see the hidden strings, it needs to be told that "invisible forces" are pushing the system. The authors found that adding a specific type of "noise" (randomness that changes based on the current state) helped the AI understand the hidden forces much better.

4. The Real-World Application: The "Pattern Effect"

The authors took these lessons and applied them to a real climate mystery called the "Pattern Effect."

The Mystery: The Earth's energy balance doesn't just depend on how much the ocean warms, but where it warms. Warming the Eastern Pacific might make the Earth hotter, while warming the Western Pacific might cool it down.
The Experiment: They built a specialized, simplified AI model that only looked at the "main patterns" of ocean temperature and the energy leaving the Earth (radiative flux).
The Success: By focusing on the big picture (coarse-graining) and adding the right "ghost noise," their AI successfully recreated the complex physics. It could predict how the Earth's energy balance would change if the ocean warmed in specific patterns. It even produced a map showing exactly where warming causes heating and where it causes cooling, matching what complex physics models say.

5. The Big Takeaway

The paper concludes that we shouldn't just build "general-purpose" AI that tries to learn everything about the climate at once. That approach is like trying to learn a symphony by listening to every single instrument simultaneously without a conductor's score—it's too messy.

Instead, we should build specialized, simplified models (Reduced-Order Models) that:

Focus on the specific question we want to answer.
Use "coarse-graining" to ignore the tiny, fast details and focus on the big, slow patterns.
Use "stochastic" (random) elements to account for the invisible parts of the system we can't see.

By doing this, and by testing these models not just on how well they mimic the past, but on how well they predict the future when "tapped," we can build climate tools that truly understand cause and effect.

Technical Summary: Probing Forced Responses and Causality in Data-Driven Climate Emulators

Problem Statement
A central challenge in climate science and applied mathematics is developing data-driven models for multiscale systems that capture not only stationary statistics but also accurate responses to external perturbations. While recent neural climate emulators aim to resolve the full complexity of atmosphere–ocean systems, they often struggle to reproduce forced responses, limiting their utility for causal studies such as Green's function experiments. High skill in reproducing stationary statistics does not guarantee accurate responses in perturbation experiments, leading to failures in generalizing to out-of-distribution climate change scenarios and capturing correct causal relations across variables. This paper investigates the conceptual limitations of purely data-driven modeling, particularly when dealing with partial observations of high-dimensional systems, and explores the role of reduced-order models (ROMs) and stochastic parameterizations in overcoming these hurdles.

Methodology
The study proceeds in two distinct parts: a theoretical analysis using a simplified dynamical system and a real-world application using a coupled climate model.

Theoretical Framework (Triad Model):
- The authors utilize a simplified stochastic "triad model" (Majda et al.) that mimics the linear and energy-conserving quadratic nonlinearities of geophysical flows, featuring a clear separation between slow and fast time scales.
- Scenarios: Two modeling scenarios are tested: (i) unknown equations with a fully observed state vector, and (ii) unknown equations with a partially observed state vector (a single scalar variable).
- Models: The authors train neural stochastic emulators (using Multilayer Perceptrons for nonlinear drift) and compare them against linear inverse models (LIMs). They test both additive and multiplicative (state-dependent) noise formulations.
- Evaluation Metric: Instead of relying solely on stationary statistics (PDFs, autocorrelations), the models are evaluated using Linear Response Theory. Specifically, the ability to reproduce the impulse response operator is assessed. This involves measuring the response of the ensemble mean and variance to small impulse perturbations and to specific time-dependent forcings.
Real-World Application (Pattern Effect):
- The authors develop a reduced-order stochastic neural model to investigate the "pattern effect"—the causal link between Sea Surface Temperature (SST) warming patterns and Top-of-Atmosphere (TOA) radiative fluxes.
- Data & Coarse-Graining: Using 600 years of pre-industrial control run data from the GFDL-CM4 coupled climate model, the authors define a coarse-grained state vector. This involves:
  - Monthly averaging and removal of the seasonal cycle.
  - High-pass filtering to remove multidecadal oscillations.
  - Projection of the tropical SST field onto the first 20 Empirical Orthogonal Functions (EOFs).
  - Inclusion of the global mean net TOA radiative flux as a scalar variable.
- Model Formulation: The emulator follows a Langevin equation structure: $dx = (Lx + n(x))dt + \Sigma(x)dW$ , where the drift is learned via a neural network and the noise covariance $\Sigma(x)$ is modeled as multiplicative (state-dependent) noise, also learned via a neural network.
- Validation: The model is tested on its ability to reconstruct TOA fluxes from forced SST trajectories (1pctCO2 and 4xCO2 experiments) and to perform Green's function-like experiments by applying impulse perturbations to SST modes to infer causal sensitivity maps.

Key Contributions and Results

Limitations of Full Observation: In the idealized case where the full state vector is observed, a neural stochastic emulator can successfully reproduce both stationary statistics and the impulse response operator (including variance responses). However, linear models fail to capture variance responses entirely.
The Critical Role of Partial Observation: When only a subset of variables is observed (the realistic scenario), the success of the emulator depends critically on two factors:
1. Choice of Variables: Identifying the "proper" slow variables is a non-trivial, non-unique task. Modeling the fast modes directly leads to failure.
2. Stochastic Parameterization: The cumulative effect of unobserved fast variables must be parameterized. The study demonstrates that multiplicative noise is essential for capturing the correct stationary distribution and, crucially, the response of the variance to perturbations. Additive noise models fail to reproduce significant variance responses.
Coarse-Graining and Markovianization: The paper highlights that in the absence of clear scale separation, appropriate coarse-graining (e.g., temporal averaging) is necessary to suppress memory effects and render the reduced-order system effectively Markovian.
Real-World Success: The reduced-order neural emulator successfully reproduces the stationary statistics of the dominant SST mode (ENSO) and the global mean TOA flux. It accurately reconstructs TOA flux changes in forced scenarios (1pctCO2 and 4xCO2).
Causal Inference: By applying impulse perturbations, the model infers a sensitivity map showing a dipole in the tropical Pacific (negative sensitivity in the west, positive in the east), consistent with established physical mechanisms (cloud feedbacks). This confirms the emulator captures physically consistent causal mechanisms.
Limitations Identified: The model does not fully capture the decay of autocorrelations (indicating residual non-Markovian behavior) and the variance response is sensitive to the training of the multiplicative noise term, requiring more data than currently available for high accuracy.

Significance and Claims
The paper argues that the field of neural emulation must move beyond general-purpose emulators that attempt to resolve all scales. Instead, it advocates for task-specific, reduced-order models that leverage prior physical knowledge to guide variable selection and coarse-graining.

Response Theory as a Framework: The authors propose Linear Response Theory and the impulse response operator as a rigorous framework for evaluating data-driven models. This approach moves beyond stationary metrics (like $R^2$ ) to probe causal mechanisms and the system's response to external interventions.
Stochasticity is Essential: For multiscale systems with unobserved degrees of freedom, deterministic models are insufficient. The inclusion of state-dependent (multiplicative) noise is not merely an ad-hoc addition but a theoretically justified necessity for capturing the correct probability distribution and its responses to perturbations.
Causal Understanding: The study demonstrates that carefully constructed reduced-order neural emulators can perform thousands of perturbation experiments (Green's function studies) that are computationally prohibitive for full climate models, thereby advancing causal inference and attribution analysis in climate science.

The paper concludes that while data-driven modeling faces fundamental constraints in partially observed settings, combining established reduced-order modeling strategies with flexible neural architectures and response theory offers a principled path forward for understanding and predicting climate dynamics.

Probing forced responses and causality in data-driven climate emulators: conceptual limitations and the role of reduced-order models