Interpretable and physics-informed emulator for the linear matter power spectrum from machine learning

Imagine the universe as a giant, cosmic ocean. In the very beginning, this ocean was calm, but tiny ripples (density fluctuations) started to form. Over billions of years, gravity acted like a sculptor, turning those tiny ripples into the massive waves, islands, and continents we see today: galaxies, clusters, and the vast empty spaces between them.

Cosmologists study this "ocean" using a map called the Matter Power Spectrum. Think of this map as a musical score. It tells us how loud the "notes" (galaxies) are at different "frequencies" (sizes). Some notes are very loud (lots of galaxies), and some are quiet (few galaxies). A specific pattern in this music, called Baryon Acoustic Oscillations (BAO), is like a cosmic drumbeat left over from the Big Bang. It serves as a "standard ruler" to measure how fast the universe is expanding.

The Problem: The "Black Box" and the "Slow Calculator"

To create this musical score, scientists usually use super-complex computer programs (like CLASS or CAMB) that solve thousands of difficult physics equations.

The Issue: These programs are incredibly accurate, but they are also slow. If you want to test a new theory of the universe, you might need to run the simulation millions of times. Doing this with the slow programs would take years.
The Current Fix: Scientists have built "emulators" (fast approximations) using Artificial Intelligence. However, many of these are like black boxes. You put numbers in, and a number comes out, but you have no idea how the AI got there. They are fast, but they are opaque and hard to trust if you want to understand the underlying physics.

The Solution: The "Physics-Informed" Detective

This paper introduces a new kind of emulator. Instead of a black box, the authors built a transparent, interpretable formula using a technique called Symbolic Regression powered by Genetic Algorithms.

Here is how they did it, using a simple analogy:

1. The Genetic Algorithm (The Evolutionary Artist)

Imagine you want to find the perfect recipe for a cake, but you don't know the ingredients.

Standard AI: Tastes a million random mixtures of flour, sugar, and sand until it finds one that tastes okay. It doesn't know why it works.
This Paper's Approach: They give the AI a "rulebook" based on physics. They say, "You can only use ingredients that make sense for a cake (flour, eggs, sugar), and you know the cake must rise."
The AI acts like natural selection. It creates thousands of random mathematical formulas (the "recipes"). It tests them against the real data (the "taste test"). The formulas that taste best (match the data) survive and "mate" (combine parts of their equations) to create new, better formulas. The bad ones die out.

2. The Result: A "Smart" Formula

After millions of "generations," the AI didn't just find a random match; it discovered a clean, mathematical equation that looks like a human wrote it.

The Smooth Part: First, they taught the AI to draw the smooth, rolling hills of the cosmic map (ignoring the tiny ripples). The AI found a formula that is 6 times simpler and more accurate than the old standard formulas used for decades.
The Wiggly Part: Then, they added the "BAO drumbeats" (the ripples). They told the AI, "The ripples should look like a sine wave that gets quieter as you go further out (Silk damping)." The AI found a way to write this down mathematically.

Why This Matters

Speed: This new formula is like a sports car compared to the old super-computer simulations. It calculates the map 500 times faster.
Transparency: Because the formula is written in standard math, scientists can look at it and say, "Ah, I see! This term represents the effect of dark energy." It's not a black box; it's a clear window.
Accuracy: It is incredibly precise, with errors less than 0.3% (less than 1 part in 300). This is good enough for the most sensitive telescopes coming online soon (like the Nancy Grace Roman Space Telescope).

Testing the "Modified Gravity" Theory

The authors also wanted to see if this tool could help test Modified Gravity (the idea that Einstein's gravity might be slightly wrong on large scales).

Instead of retraining the whole AI for every new theory, they created a universal adapter.
They took their standard "ΛCDM" (standard universe) formula and added a few "knobs" (parameters) that could twist the formula to mimic different gravity theories.
They tested this on a specific theory called f(R) gravity. The tool successfully predicted how the cosmic map would change, capturing the subtle shifts in the "drumbeat" (BAO scale) caused by the new gravity rules.

The Bottom Line

The authors have built a fast, accurate, and understandable tool for mapping the universe.

Old Way: Slow, complex, or fast but mysterious (black box).
New Way: Fast, accurate, and you can read the "recipe" to understand the physics behind it.

This is a huge step forward because it allows scientists to run millions of tests quickly and understand the results, helping us figure out if our current understanding of the universe (Dark Energy, Dark Matter, and Gravity) is truly correct.

Here is a detailed technical summary of the paper "An Interpretable and Physics-Informed Emulator for the Linear Matter Power Spectrum from Machine Learning."

1. Problem Statement

The matter power spectrum (MPS), $P(k)$ , is a fundamental observable in cosmology used to constrain parameters and test theories like $\Lambda$ CDM and Modified Gravity (MG).

Computational Bottleneck: High-precision calculation of the linear MPS requires Boltzmann solvers (e.g., CLASS, CAMB). These are computationally expensive, making them impractical for large-scale inference pipelines (e.g., Markov Chain Monte Carlo) that require thousands of evaluations.
Limitations of Existing Emulators: Current fast emulators often rely on "black-box" machine learning techniques (neural networks, Gaussian processes). While accurate, they lack physical transparency, are difficult to interpret, and require massive training datasets. Furthermore, adapting them to new physics (e.g., MG) often necessitates costly retraining.
Need for Interpretability: There is a demand for fitting functions that are not only accurate and fast but also analytically closed-form, differentiable, and physically motivated to facilitate theoretical modeling and parameter inference.

2. Methodology

The authors propose a Physics-Informed Symbolic Regression (SR) framework using Genetic Algorithms (GAs) to derive closed-form analytic expressions for the MPS.

Core Technique: Instead of training a neural network, the authors use GAs to evolve mathematical expressions (trees of functions and operators) that minimize the error against numerical data from Boltzmann solvers.
Physics-Informed Priors: To prevent overfitting and ensure physical relevance, the search space is constrained by domain knowledge:
- Separation of Scales: The transfer function $T(k)$ is decomposed into a smooth, broadband component ( $T_{nw}$ ) and an oscillatory component ( $T_w$ ) representing Baryon Acoustic Oscillations (BAO).
- Template Constraints: The GA is guided by known physical behaviors (e.g., asymptotic scaling at small scales, Silk damping, acoustic oscillation phases).
- Parameter Space: The model is trained on a grid of cosmological parameters $\{h, \omega_b, \omega_m, n_s, A_s\}$ spanning ranges around the Planck 2018 best-fit values.
Correction Strategy: The authors introduce localized empirical corrections (Gaussian and skew-normal functions) around specific physical scales (matter-radiation equality $k_{eq}$ and Silk damping scale $k_{Silk}$ ) to fix residual discrepancies without compromising the global physical structure.
Extension to Modified Gravity: Rather than training a general MG emulator, they propose a parametric deformation of the $\Lambda$ CDM smoothed component. This deformation captures MG-induced scale-dependent growth and clustering effects using interpretable parameters.

3. Key Contributions

A. The $\Lambda$ CDM Emulator

The authors derived a compact symbolic formula for the linear MPS that outperforms traditional fitting functions:

Smooth Component ( $T_{nw}$ ): A new fitting formula for the de-wiggled transfer function. It achieves 0.99% mean fractional error (MAPE) on test data, comparable to the Savitzky-Golay (SG) filter but with a closed-form expression. It is significantly simpler (6x fewer operations) and more accurate than the zero-baryon Eisenstein-Hu (EH) formula.
Oscillatory Component ( $T_w$ ): An analytic expression for BAO wiggles incorporating acoustic oscillations, Silk damping, and amplitude suppression.
Full Accuracy: By adding localized corrections around $k_{Silk}$ and $k_{eq}$ , the full emulator achieves a mean absolute percentage error (MAPE) of 0.28% across the range $k \in [10^{-5}, 1.5] \, h \, \text{Mpc}^{-1}$ .
Complexity vs. Accuracy: The final formula is approximately 4 times simpler (in terms of leaf count and depth) and 82% more accurate than the full Eisenstein-Hu formula. It is also comparable in speed to the state-of-the-art symbolic_pofk emulator but offers superior physical interpretability.

B. Nonlinear Extension

The linear emulator was used as input for the halofit nonlinear prescription. The resulting nonlinear MPS maintains sub-percent accuracy (mean error $\sim 0.30\%$ ) up to $k \sim 8 \, h \, \text{Mpc}^{-1}$ , demonstrating robustness beyond the linear training domain.

C. Modified Gravity (MG) Framework

The authors developed a parametric model to describe deviations from $\Lambda$ CDM in MG scenarios (specifically $f(R)$ gravity and Horndeski theories):

Parametric Deformation: The model introduces scale-dependent modifications to the amplitude and shape of the power spectrum using physically motivated parameters (e.g., $\gamma_{MG}$ for suppression strength, $k_{MG}$ for characteristic scale).
Performance: When applied to the $f(R)$ Hu-Sawicki model, the parametric deformation captures the MG effects with an average error of 1.5–1.8%, sufficient for BAO analysis.
Identifiability: A Fisher Matrix analysis revealed that key MG parameters (transition scale $k_T$ , spectral index $n_s$ ) are largely decoupled from amplitude degeneracies, ensuring robust parameter inference.

4. Key Results

Metric	Value	Comparison
$\Lambda$ CDM Linear MAPE	0.28%	Outperforms EH (1.63%) and matches SG filter accuracy.
Nonlinear MAPE	~0.30%	Valid up to $k \approx 8 \, h \, \text{Mpc}^{-1}$ when fed to halofit.
MG Model MAPE	1.5% - 1.8%	Captures leading-order MG modulation for $f(R)$ models.
Runtime	~150 ms (1000 spectra)	~500x faster than CLASS; comparable to `symbolic_pofk`.
BAO Scale Shift	~1.56 Mpc $h^{-1}$	Detected shift in BAO peak for strong $f(R)$ models ( $f_{R0}=5\times10^{-4}$ ).

BAO Robustness: The study confirms that the BAO peak position is relatively robust to the choice of de-wiggling method (GA vs. EH vs. SG), though MG models induce a measurable shift toward smaller scales.
Interpretability: Unlike black-box emulators, every term in the derived formula corresponds to a known physical mechanism (e.g., horizon crossing, Silk damping, acoustic oscillations).

5. Significance

Transparent Cosmology: This work bridges the gap between the speed of machine learning and the interpretability of semi-analytical formulas. It provides a "white-box" emulator where the influence of cosmological parameters on the MPS shape is explicitly visible.
Efficiency for Next-Gen Surveys: The sub-percent accuracy and closed-form nature make this emulator ideal for future large-scale structure surveys (DESI, Euclid, LSST) that require rapid, differentiable likelihood evaluations for parameter inference.
Modular MG Analysis: The framework offers a flexible tool to isolate and study specific MG effects without retraining complex neural networks, facilitating the study of deviations from General Relativity.
Code Availability: The authors provide a public implementation (Mathematica/Python) and a user guide, ensuring reproducibility and immediate utility for the cosmological community.

In conclusion, the paper demonstrates that physics-informed symbolic regression can produce compact, highly accurate, and interpretable fitting functions for the matter power spectrum, offering a superior alternative to both traditional fitting formulas and black-box machine learning emulators.

Interpretable and physics-informed emulator for the linear matter power spectrum from machine learning

The Problem: The "Black Box" and the "Slow Calculator"

The Solution: The "Physics-Informed" Detective

1. The Genetic Algorithm (The Evolutionary Artist)

2. The Result: A "Smart" Formula

Why This Matters

Testing the "Modified Gravity" Theory

The Bottom Line

1. Problem Statement

2. Methodology

3. Key Contributions

A. The Λ\LambdaΛCDM Emulator

B. Nonlinear Extension

C. Modified Gravity (MG) Framework

4. Key Results

5. Significance

More like this

High Tide or Riptide on the Cosmic Shoreline? A Water-Rich Atmosphere or Stellar Contamination for the Warm Super-Earth GJ~486b from JWST Observations

Boltzmann Equation Field Theory I: Ensemble Averages

The size and shape dependence of the SDSS galaxy bispectrum

HI Intensity Mapping cross-correlation with thermal SZ fluctuations: forecasted cosmological parameters estimation for FAST and Planck

Data Release 1 of the Dark Energy Spectroscopic Instrument

A. The $\Lambda$ CDM Emulator