Flexible Simulation Based Inference for Galaxy Photometric Fitting with Synthesizer

Imagine you are an astronomer trying to figure out the secrets of a galaxy. You look at it through a powerful telescope (like the James Webb Space Telescope), and you see a faint smudge of light. You want to know: How heavy is it? How old are its stars? Is it dusty? Is it forming new stars right now?

Traditionally, figuring this out is like trying to guess the ingredients of a secret soup by tasting it, but you have to cook a new pot of soup from scratch for every single guess you make. If you want to be sure, you might have to cook millions of pots. This is slow, expensive, and impossible if you have to do it for billions of galaxies.

This paper introduces a new tool called Synference that changes the game. Here is how it works, explained simply:

1. The Old Way: The "Cook-From-Scratch" Method

Imagine you are a detective trying to identify a suspect. The old method (called "Nested Sampling" or "MCMC") is like this:

You guess a suspect's height, weight, and hair color.
You go to the police station, find a photo of a person with those exact traits, and compare it to the crime scene photo.
If it doesn't match, you throw the photo away, pick a new random suspect, and go back to the station to find a new photo.
You repeat this thousands of times for just one galaxy.
The Problem: If you have 3,000 galaxies, this takes months of computer time. If you have 20 billion galaxies (which future telescopes will find), this method would take longer than the age of the universe.

2. The New Way: The "Super-Intuitive Chef" (Synference)

Synference uses a technique called Simulation-Based Inference (SBI). Instead of cooking from scratch every time, we train a "Super-Intuitive Chef" (a neural network) once, and then they can guess the ingredients instantly.

Here is the step-by-step process:

Step 1: The Training Camp (Simulation)
The scientists use a powerful simulator (called Synthesizer) to cook up one million fake galaxies. They know the exact recipe for every single one (e.g., "This one has 10 billion stars, is 5 billion years old, and is very dusty"). They take pictures of these fake galaxies to create a massive "Training Library."
Step 2: The Training (Learning the Pattern)
They feed this library into the "Super-Intuitive Chef" (the AI). The AI looks at the picture of a fake galaxy and tries to guess the recipe. It gets it wrong, learns, gets it wrong again, learns, and eventually, it memorizes the connection between the look of a galaxy and its physical properties.
- Analogy: It's like showing a child a million pictures of dogs and cats, telling them which is which, until the child can look at a new animal and instantly know what it is without thinking.
Step 3: The Instant Inference (The Magic)
Now, when a real galaxy is observed, the AI doesn't need to cook anything. It just looks at the picture and says, "Ah, this looks like the 45,000th fake galaxy I studied. It's 10 billion years old and dusty."
- The Speed: The old method took 80 hours of computer time to analyze 3,000 galaxies. Synference did it in 3 minutes. That is a 1,700 times speedup. It's like going from walking across the country to teleporting.

3. Why This Matters

The "Amortized" Benefit: The hard work (training the AI) is done only once. After that, analyzing a new galaxy is free and instant. This is crucial because future telescopes will find billions of galaxies. We need a tool that can handle that volume.
Full Uncertainty: The old methods often just give you a single "best guess" (e.g., "It is 5 billion years old"). Synference gives you the whole story. It says, "It's likely 5 billion, but it could be 4.8 or 5.2, and here is the probability of each." It captures the "fuzziness" of the universe.
Testing Different Recipes: The authors used Synference to test two different "cookbooks" (models of how stars are made). They found that one cookbook consistently made galaxies look heavier than the other. This helps scientists realize that the "recipe" they are using might need tweaking.

4. The Results

The team tested Synference on real galaxies from the JADES survey (using the James Webb Space Telescope).

Accuracy: It matched the results of the slow, traditional methods almost perfectly.
Speed: It processed 3,088 galaxies in the time it takes to brew a cup of coffee.
Reliability: They checked it against "ground truth" (fake galaxies where they knew the answer) and found it was incredibly accurate, especially for measuring the mass of stars.

Summary

Synference is a new, flexible tool that uses Artificial Intelligence to turn the slow, painful process of analyzing galaxy light into a fast, instant guess. It trains a "super-brain" on a million fake galaxies so it can instantly understand real ones. This allows astronomers to finally process the massive flood of data coming from our new, powerful telescopes, helping us understand how the universe formed and evolved much faster than ever before.

In a nutshell: We stopped trying to solve every puzzle from scratch and started building a master detective who has seen every possible puzzle before. Now, solving a new one takes a split second.

Here is a detailed technical summary of the paper "Flexible Simulation Based Inference for Galaxy Photometric Fitting with Synference" by Harvey et al. (2025).

1. Problem Statement

The astronomical community faces an imminent data deluge from next-generation surveys (e.g., JWST, Euclid, Roman, Rubin Observatory), which will observe billions of galaxies. Traditional Bayesian inference methods for Spectral Energy Distribution (SED) fitting, such as Markov Chain Monte Carlo (MCMC) and Nested Sampling (e.g., bagpipes, prospector), are computationally prohibitive for these datasets.

Computational Bottleneck: Fitting a single galaxy can take minutes to days of CPU time.
Scalability: Processing millions of galaxies with traditional methods is infeasible.
Limitations of Existing ML: Previous machine learning approaches often yield only point estimates (ignoring uncertainties and parameter degeneracies) or rely on emulators that still require expensive likelihood evaluations.

The paper addresses the need for a method that provides full posterior distributions (capturing uncertainties and degeneracies) with amortized inference (near-instantaneous evaluation after training) to handle the scale of future surveys.

2. Methodology: The `synference` Framework

The authors introduce synference, a flexible Python framework designed for Simulation-Based Inference (SBI), specifically tailored for galaxy SED fitting.

Core Architecture

Simulation-Based Inference (SBI): The method bypasses the need for an explicit likelihood function. Instead, it learns the statistical mapping between observations ( $x$ ) and physical parameters ( $\theta$ ) using a Neural Density Estimator (NDE) trained on simulated data.
Forward Modeling (synthesizer): The framework uses the synthesizer package to generate a comprehensive library of synthetic galaxy observations. This allows for flexible forward modeling including stellar continua, nebular emission (via cloudy), dust attenuation/emission, AGN, and IGM absorption.
Training Backend (LtU-ILI): It integrates the LtU-ILI package to manage model training, validation, and hyperparameter optimization (using Optuna).
Modularity: The framework decouples the simulation stage from the training stage, allowing a single library of simulations to be reused for different observational datasets or scientific questions.

Specific Implementation (Model 1)

Training Data: $10^6$ simulated galaxies generated using a flexible 8-parameter physical model.
- Free Parameters: Stellar mass ( $M_*$ ), dust attenuation ( $A_V$ ), stellar metallicity ( $Z_*$ ), Star Formation Rate (SFR), and three parameters defining a Gaussian Process Star Formation History (SFH) ( $t_{25}, t_{50}, t_{75}$ ).
- Derived Parameters: Mass-weighted age, 10-Myr averaged SFR, surviving stellar mass, and UV slope ( $\beta$ ).
Observational Input: 14-band photometry from HST (ACS) and JWST (NIRCam) covering the GOODS-South field.
Noise Modeling: An empirical noise model was applied to match the noise characteristics of the JADES survey, converting fluxes to asinh magnitudes to handle non-detections gracefully.
Neural Architecture: The authors tested Mixture Density Networks (MDN) and Neural Spline Flows (NSF). Neural Spline Flows (NSF) were identified as optimal via hyperparameter optimization, achieving the best log-probability scores.

3. Key Contributions

Development of synference: A new, open-source framework that lowers the barrier to entry for SBI in astrophysics by providing a unified interface for simulation, feature engineering, training, and inference.
Amortized Inference at Scale: Demonstrated the ability to infer physical parameters for thousands of galaxies in minutes, a speedup of $\sim 1700\times$ over traditional Nested Sampling.
Robust Validation: The model was rigorously validated against:
- Held-out test sets (simulated data).
- Reference posteriors from Nested Sampling (dynesty).
- Traditional fitting codes (bagpipes).
Model Comparison Capabilities: Demonstrated the ability to rapidly compare different physical models (e.g., BPASS vs. FSPS stellar population synthesis grids) by training separate SBI models, revealing systematic biases in derived stellar masses.
Photometric Redshift Inference: Showed that synference can simultaneously infer redshift and physical parameters (Model 2), producing full Bayesian posteriors for redshifts with competitive accuracy compared to template-fitting methods like EAZY.

4. Results

Performance Metrics

Parameter Recovery: The model achieved excellent recovery of physical parameters on simulated data:
- Stellar Mass ( $M_*$ ): $R^2 > 0.99$ .
- Dust Attenuation ( $A_V$ ): $R^2 = 0.86$ .
- SFH Parameters: Generally good recovery, though some degeneracies remain (as expected from photometry alone).
Calibration: The model passed Simulation-Based Calibration (SBC) and Tests of Accuracy with Random Points (TARP), indicating that the posterior distributions are well-calibrated and unbiased.
Comparison with bagpipes:
- Applied to 3,088 spectroscopically confirmed galaxies in JADES GOODS-South.
- Speed: synference processed the entire sample in ~3 minutes on a single CPU core (18 galaxies/sec). bagpipes required ~80 CPU-hours (1700x slower).
- Accuracy: Stellar mass estimates showed excellent agreement (median offset 0.03 dex). synference showed slightly higher SFRs and ages in certain regimes but handled quiescent galaxies more robustly than bagpipes in some cases.
- Systematic Differences: When comparing models using different SPS grids (BPASS vs. FSPS), synference revealed a systematic 0.3 dex offset in stellar mass, highlighting the impact of model choice.

Redshift Inference (Model 2)

When inferring redshift directly from photometry (without spectroscopic input), the model achieved an outlier fraction ( $|\Delta z|/(1+z) > 0.15$ ) of 11.7% and an NMAD of 0.055.
This is competitive with EAZY (outlier fraction 16.0%) but provides full Bayesian posteriors rather than point estimates.

5. Significance and Future Outlook

Scalability: synference solves the computational bottleneck for next-generation surveys. It enables the analysis of billions of galaxies, making full Bayesian inference feasible where it was previously impossible.
Scientific Return: By providing full posterior distributions, it allows for rigorous uncertainty quantification and the exploration of complex parameter degeneracies (e.g., age-metallicity-dust degeneracies).
Flexibility: The framework supports various SPS models, dust laws, and SFH parametrizations, allowing for rapid "what-if" scenarios and model selection.
Future Work: The authors plan to extend synference to spatially resolved SED fitting, inference from hydrodynamical simulations, and handling missing data (a current limitation where all filters must be observed).

In conclusion, synference represents a paradigm shift in galaxy SED fitting, moving from slow, iterative optimization to fast, amortized, simulation-based inference, thereby maximizing the scientific potential of upcoming astronomical datasets.

Flexible Simulation Based Inference for Galaxy Photometric Fitting with Synthesizer

1. The Old Way: The "Cook-From-Scratch" Method

2. The New Way: The "Super-Intuitive Chef" (Synference)

3. Why This Matters

4. The Results

Summary

1. Problem Statement

2. Methodology: The synference Framework

Core Architecture

Specific Implementation (Model 1)

3. Key Contributions

4. Results

Performance Metrics

Redshift Inference (Model 2)

5. Significance and Future Outlook

More like this

Energy extraction and particle acceleration around a rotating dyonic black hole in N=2N=2N=2, U(1)2U(1)^2U(1)2 gauged supergravity

Euclid: Constraints on f(R) cosmologies from the spectroscopic and photometric primary probes

Prevention is better than cure? Feedback from high specific energy winds in cosmological simulations with Arkenstone

Astromer 2

Probing the Cosmic Baryon Distribution and the Impact of Active Galactic Nuclei Feedback with Fast Radio Bursts in CROCODILE Simulation

2. Methodology: The `synference` Framework

Energy extraction and particle acceleration around a rotating dyonic black hole in $N=2$ , $U(1)^2$ gauged supergravity