From Simulations to Surveys: Domain Adaptation for… — Plain-Language Explanation

Original authors: Kaley Brauer, Aditya Prasad Dash, Meet J. Vyas, Ahmed Salim, Stiven Briand Massala

Published 2026-06-09

📖 5 min read🧠 Deep dive

Original authors: Kaley Brauer, Aditya Prasad Dash, Meet J. Vyas, Ahmed Salim, Stiven Briand Massala

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to teach a student how to identify different types of cars.

The Problem: The "Video Game" vs. The "Real World"
In this paper, the "students" are computer programs (AI models), and the "cars" are galaxies.

The Source (The Video Game): The researchers first trained their AI using images from a super-advanced computer simulation called TNG50. Think of this like a perfect, high-definition video game. In the game, the AI knows exactly what every car is (a sedan, a truck, or a sports car) because the game creator programmed it that way.
The Target (The Real World): The researchers then wanted the AI to look at real photos of galaxies taken by the SDSS telescope. This is like taking the AI out of the video game and putting it on a busy, rainy street. The real photos look different: they are grainier, the lighting is weird, and the "cars" (galaxies) look a bit different than in the game.

If you just take the AI trained on the video game and let it guess on the real street, it gets confused. It might think a real truck is a sports car because the lighting is different. This is called a "domain shift."

The Solution: The "Translator" Pipeline
The paper describes a new method to act as a translator between the video game world and the real world. They built a pipeline to help the AI learn that "a spiral galaxy in the game" is the same thing as "a spiral galaxy in the real photo," even though they look different.

Here is how they did it, using simple analogies:

The Three Teachers (Backbones):
They tried three different types of AI "teachers" (neural networks) to do the learning:
- A small, simple teacher (CNN).
- A teacher that is very good at recognizing shapes no matter how they are rotated (E(2)-steerable CNN).
- A famous, pre-trained teacher (ResNet-18) that they fine-tuned for this specific job.
The "Hard Mode" Training (Focal Loss):
In their data, there are way more "Spiral" galaxies than "Elliptical" or "Irregular" ones. It's like a classroom where 90% of the students are wearing red shirts, and only a few wear blue. If the AI just guesses "Red" every time, it gets a high score but learns nothing about the blue shirts.
To fix this, they used a special scoring rule called Focal Loss. It's like a teacher who says, "I don't care if you get the easy red-shirt questions right; I'm going to give you extra credit (or extra punishment for mistakes) if you get the rare blue-shirt questions right." This forces the AI to pay attention to the rare galaxy types.
The "Blending" Trick (Domain Adaptation):
This is the core of their invention. They added a special rule to the training process that forces the AI to mix up the "game" images and the "real" images in its internal memory.
- The Goal: They want the AI's internal map to look like a smoothie where the "game" ingredients and "real" ingredients are blended so well that you can't tell which is which.
- The Tool: They used a mathematical tool called Optimal Transport (specifically "Sinkhorn" and "Top-k"). Imagine you have two piles of puzzle pieces (one from the game, one from reality). The AI tries to match them up.
- The "Top-k" Secret Sauce: Usually, the AI tries to match every piece. But sometimes, it matches a game-piece to the wrong real-piece just to make the math work. The researchers added a "Top-k" rule: "Ignore the easy matches; focus only on the 10 hardest pairs that don't fit well, and force those to match." This is like telling the AI, "Stop faking it on the easy stuff; fix the specific mismatches that are really confusing you."

The Results: From Confused to Confident
The paper reports the results of this experiment:

Before the fix: When the AI tried to guess the galaxy types on real photos without this special training, it was only about 46% accurate. It was basically guessing.
After the fix: With their new "Top-k" blending method, the accuracy jumped to 87%.
The Proof: They checked the AI's internal "brain" (latent space). Before the fix, the AI kept the game images and real images in separate rooms (it knew they were different). After the fix, the rooms were merged into one big hall where the images were mixed together perfectly. This proved the AI had truly learned to see the similarities, not just the differences.

What's Next?
The authors say this is just a "proof of concept." They plan to:

Teach the AI to recognize more than just shapes (like how much gas a galaxy has or if it has a black hole).
Get better at spotting the rare "Irregular" galaxies.
Test this on even bigger, future telescope data (like the Vera C. Rubin Observatory).

In short, they built a bridge that allows an AI trained on perfect computer simulations to successfully understand messy, real-life photos of the universe.

Technical Summary: From Simulations to Surveys: Domain Adaptation for Galaxy Observations

Problem Statement
The paper addresses the critical challenge of transferring machine learning models trained on simulated galaxy data to real observational surveys. While large photometric surveys (e.g., Vera C. Rubin Observatory, Euclid) will image billions of galaxies, inferring physical properties like morphology, stellar mass, and star formation rates remains difficult without rapid, automated methods. Simulations (specifically TNG50) provide images with ground-truth physical labels, but a significant "domain shift" exists between these simulations and real data (e.g., SDSS). This shift arises from differences in Point Spread Function (PSF), noise, background levels, selection functions, and demographic priors. Naive transfer of models trained on simulations to real data risks biasing physical inferences, distorting mass–star formation rate demographics, and contaminating scaling relations. The authors frame this as a covariate-shift problem where the conditional label distribution is approximately stable ( $p_S(y|x) \approx p_T(y|x)$ ), but the input and selection distributions differ ( $p_S(x) \neq p_T(x)$ ).

Methodology
The authors propose a preliminary domain adaptation pipeline that trains on mock TNG50 observations and evaluates on real SDSS galaxies with Galaxy Zoo-derived morphology labels (elliptical, spiral, irregular).

Data:
- Source: 3,232 galaxies from the Illustris TNG50 simulation (z=0 and z≈0.05) processed with SKIRT to generate synthetic 4-band (g,r,i,z) images. The dataset is augmented via flips and rotations to 25,856 images.
- Target: 6,416 real SDSS galaxies with morphology labels derived from Galaxy Zoo volunteers. The classes are highly imbalanced, with spirals dominating and irregulars being rare.
Architectures: Three backbone networks are compared:
1. A small custom CNN (two conv blocks + MLP).
2. An E(2)-steerable CNN (ESCNN) using a discrete rotation group $C_8$ .
3. A ResNet-18 pretrained on ImageNet, fine-tuned with a task-specific MLP head.
Loss Functions and Training Strategy:
- Supervised Loss: Focal loss with effective-number class weighting is used to handle class imbalance, replacing standard cross-entropy.
- Domain Alignment: The core contribution is a feature-level domain loss ( $L_D$ ) computed on $L_2$ -normalized embeddings using differentiable distance metrics from an extended GeomLoss library. The authors benchmark 46 distinct distance/similarity measures across eight families (e.g., Minkowski, Inner Product, Entropy).
- Optimal Transport (OT) & Top-k Matching: A novel composite alignment loss ( $L_{OT}$ $L_{O T}$ ) is introduced. It combines:
  1. Global entropic optimal transport (Sinkhorn divergence) for soft matching.
  2. A "top-k" penalty focusing on the $k$ worst-matched source–target pairs to prevent mismatched couplings (e.g., spirals aligning to ellipticals).
  3. The full loss is $L = \lambda_{sup} L_{sup} + \lambda_D L_D + \lambda_{OT} L_{OT}$ .
- Training Regimen: Models undergo a 20-epoch warmup with supervised loss only, followed by joint training. Strategies for weighting losses include fixed weights, trainable weights (via sigmoid functions), and a "blur schedule" for Sinkhorn parameters. A Domain Adversarial Neural Network (DANN) with a Gradient Reversal Layer (GRL) is also implemented as a baseline.

Key Results

Performance Gains: The domain adaptation pipeline significantly improves target domain performance. Without adaptation (Baseline), the macro F1 score is approximately 30% (accuracy 46%). With the proposed Euclidean distance-based adaptation using trainable weights and top-k matching, the target macro F1 rises to **62.6%** and accuracy to ~87.3%.
Latent Space Alignment: The effectiveness of the adaptation is visualized via a domain classifier (AUC). The Baseline shows perfect domain separation (AUC = 1.00), indicating the model can easily distinguish simulation from real data. In contrast, the best adapted models achieve a domain AUC near 0.51–0.53, indicating that source and target distributions are effectively mixed in the latent space.
Metric Sensitivity: The study highlights that the choice of distance metric in the alignment loss is crucial. While Euclidean distance performed well, the authors systematically tested 12 representative metrics (including Jaccard, Dice, and various norms) to understand their impact on alignment.
Stability: The trainable weighting scheme ( $\lambda_{sup}, \lambda_D$ ) provided the most stable convergence compared to fixed weights or adversarial training alone.

Significance and Claims
The paper positions this work as a prototype pipeline and a precursor to a larger effort aimed at interpreting upcoming Rubin Observatory galaxy observations using hundreds of thousands of mock observations from Illustris simulations.

Modest Scope: The authors explicitly state this is a "preliminary" study and a "proof of concept." They do not claim to have solved the general domain adaptation problem for all astrophysical tasks but rather demonstrate that specific combinations of OT-based losses and top-k matching can effectively narrow the gap between TNG50 simulations and SDSS observations for morphology classification.
Scientific Consequence: The work emphasizes that robust domain adaptation is necessary to preserve calibrated, physically meaningful predictions for population studies. Without it, models risk shifting early/late-type mixes and distorting scaling relations.
Future Directions: The authors outline specific next steps, including extending to multi-task learning (stellar mass, AGN, star formation), improving handling of the rare "irregular" class, investigating distance-aware learning rate schedulers, and testing alternative architectures like equivariant transformers.

The paper concludes that while previous studies have shown promise, methodological development in distance metrics and alignment strategies (specifically the top-k soft matching) offers a viable path toward reliable transfer learning for next-generation astronomical surveys.

From Simulations to Surveys: Domain Adaptation for Galaxy Observations

Technical Summary: From Simulations to Surveys: Domain Adaptation for Galaxy Observations

More like this