From Simulations to Surveys: Domain Adaptation for Galaxy Observations

This paper presents a domain adaptation pipeline that significantly improves the accuracy of classifying real SDSS galaxy morphologies by training on simulated TNG50 images and employing a combination of feature-level optimal transport losses, including a novel top-kk soft matching mechanism, to effectively bridge the simulation-to-reality gap.

Original authors: Kaley Brauer, Aditya Prasad Dash, Meet J. Vyas, Ahmed Salim, Stiven Briand Massala

Published 2026-06-09
📖 5 min read🧠 Deep dive

Original authors: Kaley Brauer, Aditya Prasad Dash, Meet J. Vyas, Ahmed Salim, Stiven Briand Massala

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to teach a student how to identify different types of cars.

The Problem: The "Video Game" vs. The "Real World"
In this paper, the "students" are computer programs (AI models), and the "cars" are galaxies.

  • The Source (The Video Game): The researchers first trained their AI using images from a super-advanced computer simulation called TNG50. Think of this like a perfect, high-definition video game. In the game, the AI knows exactly what every car is (a sedan, a truck, or a sports car) because the game creator programmed it that way.
  • The Target (The Real World): The researchers then wanted the AI to look at real photos of galaxies taken by the SDSS telescope. This is like taking the AI out of the video game and putting it on a busy, rainy street. The real photos look different: they are grainier, the lighting is weird, and the "cars" (galaxies) look a bit different than in the game.

If you just take the AI trained on the video game and let it guess on the real street, it gets confused. It might think a real truck is a sports car because the lighting is different. This is called a "domain shift."

The Solution: The "Translator" Pipeline
The paper describes a new method to act as a translator between the video game world and the real world. They built a pipeline to help the AI learn that "a spiral galaxy in the game" is the same thing as "a spiral galaxy in the real photo," even though they look different.

Here is how they did it, using simple analogies:

  1. The Three Teachers (Backbones):
    They tried three different types of AI "teachers" (neural networks) to do the learning:

    • A small, simple teacher (CNN).
    • A teacher that is very good at recognizing shapes no matter how they are rotated (E(2)-steerable CNN).
    • A famous, pre-trained teacher (ResNet-18) that they fine-tuned for this specific job.
  2. The "Hard Mode" Training (Focal Loss):
    In their data, there are way more "Spiral" galaxies than "Elliptical" or "Irregular" ones. It's like a classroom where 90% of the students are wearing red shirts, and only a few wear blue. If the AI just guesses "Red" every time, it gets a high score but learns nothing about the blue shirts.
    To fix this, they used a special scoring rule called Focal Loss. It's like a teacher who says, "I don't care if you get the easy red-shirt questions right; I'm going to give you extra credit (or extra punishment for mistakes) if you get the rare blue-shirt questions right." This forces the AI to pay attention to the rare galaxy types.

  3. The "Blending" Trick (Domain Adaptation):
    This is the core of their invention. They added a special rule to the training process that forces the AI to mix up the "game" images and the "real" images in its internal memory.

    • The Goal: They want the AI's internal map to look like a smoothie where the "game" ingredients and "real" ingredients are blended so well that you can't tell which is which.
    • The Tool: They used a mathematical tool called Optimal Transport (specifically "Sinkhorn" and "Top-k"). Imagine you have two piles of puzzle pieces (one from the game, one from reality). The AI tries to match them up.
    • The "Top-k" Secret Sauce: Usually, the AI tries to match every piece. But sometimes, it matches a game-piece to the wrong real-piece just to make the math work. The researchers added a "Top-k" rule: "Ignore the easy matches; focus only on the 10 hardest pairs that don't fit well, and force those to match." This is like telling the AI, "Stop faking it on the easy stuff; fix the specific mismatches that are really confusing you."

The Results: From Confused to Confident
The paper reports the results of this experiment:

  • Before the fix: When the AI tried to guess the galaxy types on real photos without this special training, it was only about 46% accurate. It was basically guessing.
  • After the fix: With their new "Top-k" blending method, the accuracy jumped to 87%.
  • The Proof: They checked the AI's internal "brain" (latent space). Before the fix, the AI kept the game images and real images in separate rooms (it knew they were different). After the fix, the rooms were merged into one big hall where the images were mixed together perfectly. This proved the AI had truly learned to see the similarities, not just the differences.

What's Next?
The authors say this is just a "proof of concept." They plan to:

  • Teach the AI to recognize more than just shapes (like how much gas a galaxy has or if it has a black hole).
  • Get better at spotting the rare "Irregular" galaxies.
  • Test this on even bigger, future telescope data (like the Vera C. Rubin Observatory).

In short, they built a bridge that allows an AI trained on perfect computer simulations to successfully understand messy, real-life photos of the universe.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →