Characterization of Residual Morphological Substructure Using Supervised and Unsupervised Deep Learning

Imagine the universe as a giant, bustling city where galaxies are the buildings. Sometimes, these buildings crash into each other, merge, or get torn apart by gravity. When they do, they leave behind "scars" or "debris"—faint, wispy trails of stars and gas that look like the aftermath of a car crash. Astronomers call these residual substructures.

For decades, scientists have tried to find these scars to understand how galaxies grow and evolve. But looking at millions of galaxies one by one is like trying to find a specific typo in a library of a billion books by reading every single page with your eyes. It's slow, tiring, and prone to human error.

This paper is about teaching computers to do this job automatically using Deep Learning (a type of Artificial Intelligence). The researchers built two different "digital detectives" to scan these galactic scars and tell them apart.

Here is the breakdown of their adventure, explained simply:

1. The Setup: Cleaning the Canvas

Before the computers could learn, the researchers had to prepare the data.

The Problem: When you look at a galaxy, it's usually a bright, smooth blob of light. The "scars" (residuals) are faint and hidden underneath that brightness.
The Solution: They used a mathematical tool (GALFIT) to subtract the "smooth blob" from the image, leaving only the messy, leftover debris.
The Twist: To make sure the computer didn't get distracted by other stars or dust nearby, they cropped the image to show only the galaxy in question, filling the rest with a blank, starry background. Think of it like taking a photo of a messy room, but digitally erasing the furniture so you only see the dust bunnies on the floor.

2. The Two Detectives: Supervised vs. Unsupervised

The team trained two different types of AI models to analyze these "dust bunny" images.

Detective A: The Supervised CNN (The "Teacher's Pet")

How it works: This is like a student learning with a teacher. The researchers showed the computer thousands of images and said, "This one is a 'Clean' room (no debris)," "This one is 'Asymmetric' (messy on one side)," and "This one is 'Peculiar' (weirdly shaped)."
The Goal: The computer learned to memorize these labels and predict what kind of mess a new galaxy has.
The Result: It got really good at it! It learned to distinguish between a galaxy with a strong, messy crash (high "residual strength") and a galaxy that is mostly clean. It essentially learned to say, "This galaxy looks like it was in a major fight."

Detective B: The Unsupervised CvAE (The "Independent Thinker")

How it works: This computer was given no labels. It was just told, "Look at all these messy rooms. Figure out the patterns yourself." It tried to compress the images into a simpler summary (a "latent space") and then rebuild them.
The Goal: To see if the computer could naturally group similar galaxies together without being told what to look for.
The Result: It was okay, but not great. It could tell the difference between a very clean galaxy and a very messy one, but it struggled to tell the difference between the types of mess (e.g., distinguishing a spiral crash from a weird blob). It was like a child who knows the difference between "clean" and "dirty" but can't yet tell the difference between "a spilled milk puddle" and "a broken toy."

3. The "X-Ray" Vision: PCA

To understand what the computers were actually learning, the researchers used a technique called Principal Component Analysis (PCA).

The Analogy: Imagine you have a huge pile of different colored marbles. PCA is like a machine that sorts them not by color, but by "shininess" and "roundness."
The Discovery:
- The Supervised Detective sorted the galaxies perfectly along a "Messiness Scale." One side of the scale was "Super Clean," and the other was "Total Chaos." It correlated perfectly with how much "extra light" (debris) was in the image.
- The Unsupervised Detective also found a "Messiness Scale," but it was blurry. It couldn't separate the different kinds of chaos as clearly.

4. The Clustering: Grouping the Mess

The researchers then asked the computers to group the galaxies into clusters, like sorting laundry.

Supervised Results: The computer found 6 distinct groups. Some groups were galaxies with huge, dramatic tidal tails (like long streams of stars). Others were galaxies with just a tiny smudge in the center. The computer could draw clear lines between these groups.
Unsupervised Results: The computer only found 2 groups: "Messy" and "Clean." It missed the nuance of the different types of mess.

Why Does This Matter?

The universe is getting bigger, and telescopes are taking millions of pictures. Humans can't look at them all.

The Takeaway: This paper proves that Supervised Deep Learning (teaching the AI with examples) is a powerful tool for finding the "scars" of galaxy collisions. It can automatically flag the most interesting galaxies for human astronomers to study in detail.
The Future: While the "Unsupervised" method (letting the AI figure it out alone) wasn't as sharp yet, it's a promising step. As AI gets smarter, we might not need teachers at all; the computers might just discover new types of galaxy crashes that humans haven't even thought of yet.

In a nutshell: The researchers taught a computer to look at the "aftermath" of galaxy crashes. The computer that was taught by humans (Supervised) became an expert detective, while the one that learned on its own (Unsupervised) was a bit of a novice. This helps astronomers quickly find the most dramatic cosmic collisions in the vast universe.

1. Problem Statement

Understanding galaxy evolution requires identifying major mergers and the associated morphological disturbances (e.g., tidal features). Traditional methods for identifying these features rely on:

Close-pair counts: Which suffer from projection effects and redshift uncertainties.
Quantitative metrics (CAS, $G-M_{20}$ ): Which can be biased by surface brightness dimming and redshift.
Visual inspection of residual images: Where a parametric light profile (e.g., single-Sérsic) is subtracted from the galaxy image to reveal substructure. This is time-consuming, subjective, and non-repeatable for large surveys.

The authors aim to develop automated Deep Learning (DL) frameworks to characterize these "residual" images (the difference between observed and modeled light) to identify and quantify substructures like tidal tails, clumps, and asymmetries in a large sample of massive galaxies ( $1 < z < 3$ ) from the CANDELS survey.

2. Methodology

Data Preparation

Sample: 10,046 massive ( $M_{stellar} \geq 10^{9.5} M_{\odot}$ ), bright ( $H < 24.5$ mag) galaxies from the HST CANDELS survey ( $1 < z < 3$ ).
Input Data: H-band (F160W) images and pre-computed single-Sérsic model-subtracted residual images from van der Wel et al. (2012).
Pre-processing (Object-Only): To prevent the DL models from learning artifacts from neighboring stars or background noise, the authors created "object-only" images. They used source extraction (SEP) to generate a mask of the Galaxy of Interest (GOI), preserved the GOI pixels, and replaced the rest with a synthesized sky background.
Data Augmentation: To address class imbalance and viewing angle biases, the training set was augmented via random horizontal flips and 45-degree rotations. The final augmented training set contained 25,000 images, with 5,000 per class.
Ground Truth Labels: Residual images were visually classified by human experts into five classes: Clean, General, Core, Asymmetric, and Peculiar.
Quantitative Metrics: Three independent metrics were calculated to validate the DL models:
- Significant Pixel Flux (SPF): Cumulative flux of pixels $>3\sigma$ above background.
- Bumpiness ( $B$ ): Ratio of residual RMS to the Sérsic model.
- Residual Flux Fraction (RFF): Fraction of residual light not attributable to noise.

Deep Learning Architectures

The authors developed two distinct frameworks:

Supervised Convolutional Neural Network (CNN):
- Architecture: Inspired by Huertas-Company et al. (2015). It consists of three Convolutional + MaxPooling blocks, followed by two Dense layers.
- Latent Space: A 512-dimensional fully connected layer serves as the bottleneck (latent space) before the final 5-class Softmax output.
- Training: Trained with categorical cross-entropy loss, Adam optimizer, and dropout (50%) to prevent overfitting. Gaussian noise was added at the input to improve robustness.
Unsupervised Convolutional Variational Autoencoder (CvAE):
- Architecture: An Encoder-Decoder framework. The Encoder compresses the image into a latent distribution (mean $\mu$ and variance $\sigma^2$ ) rather than a fixed vector.
- Latent Space: A 512-dimensional latent vector sampled from a unit Gaussian distribution.
- Training: Optimized using a combined loss function: Mean Squared Error (reconstruction loss) + Kullback-Leibler (KL) divergence (regularization).

Analysis Strategy

Dimensionality Reduction: Principal Component Analysis (PCA) was applied to the 512-dimensional latent vectors to visualize the data in 2D (PC1 vs. PC2).
Clustering: Gaussian Mixture Modeling (GMM) was used to identify natural clusters in the PCA space.
Boundary Definition: Support Vector Classification (SVC) was employed to define decision boundaries between the GMM clusters.

3. Key Contributions

Novel Data Pre-processing: Introduction of a robust "object-only" masking technique that isolates the galaxy of interest while preserving the sky background context, preventing the model from learning segmentation artifacts.
Dual Framework Comparison: A direct comparison between supervised (CNN) and unsupervised (CvAE) approaches specifically applied to residual images (rather than raw galaxy images) for merger detection.
Quantitative Validation: Integration of independent physical metrics (SPF, B, RFF) to interpret the "meaning" of the learned latent space, moving beyond simple classification accuracy.
Public Release: The authors provide the methodology and insights for automated residual characterization, crucial for upcoming large-scale surveys (e.g., Roman, Euclid, LSST).

4. Results

Supervised CNN Performance

Classification: Achieved ~~95% training accuracy and ~75% testing accuracy. The model performed best on "Clean" classes (~~91%) but struggled with confusion between "Peculiar," "General," and "Asymmetric" classes.
Latent Space Structure: PCA revealed a clear separation in the latent space.
- PC1 strongly correlated with residual strength (SPF).
- Clean galaxies clustered distinctly from Peculiar/General (strong residuals).
- Core and Asymmetric classes occupied an intermediate "saddle" region, overlapping with strong residual classes.
Clustering: GMM identified 6 distinct clusters in the PCA space.
- Cluster 2 contained galaxies with strong, interesting tidal features (high SPF).
- Cluster 4 contained "Clean" residuals (low SPF).
- Clusters 0, 1, and 5 contained weaker, diffuse signatures.
- Cluster 3 contained outliers with distinct central patterns.
Conclusion: The supervised CNN successfully learned a latent representation that physically correlates with residual strength and can naturally segregate galaxies by substructure type.

Unsupervised CvAE Performance

Reconstruction: The CvAE produced qualitatively smooth reconstructions of the input residuals.
Latent Space Structure: The PCA distribution was less distinct than the CNN.
- Classes formed a continuum rather than distinct clusters.
- PC1 showed a bimodal correlation with SPF (separating clean vs. strong residuals), but PC2 showed no clear correlation with visual classes or metrics.
Clustering: GMM identified only 2 optimal clusters.
- One cluster contained a mix of strong and intermediate residuals.
- The other contained mostly clean/weak residuals.
Conclusion: While the CvAE learned a measure of residual strength, it lacked the discriminatory power of the supervised CNN to distinguish between specific morphological types (e.g., Core vs. Asymmetric).

5. Significance

This study demonstrates that Deep Learning applied to residual images is a viable and powerful tool for automating the identification of galaxy mergers and substructures.

Supervised Learning: Proven effective for categorizing complex morphological features when high-quality visual labels are available, offering a path to accelerate the analysis of massive datasets.
Unsupervised Learning: Useful for identifying broad trends (e.g., strong vs. weak residuals) without labels but currently lacks the granularity for detailed morphological classification.
Future Impact: The framework provides a scalable solution for the "Big Data" era of astronomy, enabling the automated screening of millions of galaxies from future telescopes to find rare merger events and tidal features that drive galaxy evolution.