Morphologies for DECaLS Galaxies through a combination of non-parametric indices and machine learning methods: A comprehensive catalog using the Galaxy Morphology Extractor (galmex) code

This paper introduces the galmex Python package to generate a comprehensive catalog of non-parametric morphological indices for DECaLS galaxies, demonstrating their effectiveness—particularly when combined with LightGBM machine learning—in reliably classifying spiral and elliptical galaxies for future southern hemisphere surveys.

V. M. Sampaio, Y. Jaffé, C. Lima-Dias, S. Véliz Astudillo, M. Martínez-Marín, H. Méndez-Hernández, R. Herrera-Camus, A. Monachesi

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine the universe as a giant, bustling city. In this city, galaxies are the buildings. Some are sleek, modern skyscrapers with smooth glass facades (elliptical galaxies), while others are sprawling, complex neighborhoods with winding streets, parks, and distinct districts (spiral galaxies).

For decades, astronomers have tried to map this city. They used to do it by looking at photos and guessing, "That looks like a spiral!" or "That looks like a blob." But with millions of new photos coming in from powerful telescopes, human eyes just can't keep up. We need a robot to do the sorting.

This paper is about building that robot and teaching it how to tell the difference between a "skyscraper" and a "neighborhood" without ever needing to see the whole building clearly.

Here is the story of how they did it, broken down into simple steps:

1. The Problem: Too Many Galaxies, Too Little Time

The researchers are using data from the DECaLS survey, which is like a massive, high-resolution photo album of the southern sky. It contains millions of galaxies.

  • The Challenge: If you try to measure a galaxy by fitting a mathematical curve to its shape (like trying to force a square peg into a round hole), it often fails because galaxies are messy. They have bars, spirals, and weird bumps.
  • The Solution: Instead of guessing the shape, they decided to measure how the light is distributed. Think of it like judging a cake not by its shape, but by how the sugar is sprinkled on top. Is the sugar concentrated in the middle? Is it spread out evenly? Is it clumpy?

2. The Tool: "Galmex" (The Digital Chef)

The team built a new software package called galmex (Galaxy Morphology Extractor). You can think of this as a super-precise kitchen robot.

  • Preparation: Before measuring, the robot has to clean the image. It removes the "sky" (the background noise), cuts out the specific galaxy, and paints over any neighboring stars or galaxies that might be getting in the way.
  • The Measurements: Once the image is clean, galmex calculates a set of "fingerprints" for each galaxy. These are called non-parametric indices.
    • Concentration: How much of the light is in the center? (Like how much frosting is in the middle of a cupcake).
    • Asymmetry: Is the galaxy lopsided? (Like a crooked tower).
    • Smoothness: Is the surface bumpy or smooth?
    • Gini & Entropy: These are fancy math ways of asking, "Is the light spread out evenly, or is it clumped up in a few bright spots?"

3. The Training: Teaching the Robot with "Control Samples"

To teach the robot what a "spiral" and an "elliptical" look like, they didn't just guess. They used a "Gold Standard" dataset from a project called Galaxy Zoo, where thousands of real humans looked at galaxies and voted on what they were.

  • They took the human-voted "Spirals" and "Ellipticals" and fed them into their robot.
  • They asked the robot: "Look at the fingerprints (the measurements) of these human-confirmed spirals. What do they have in common? Now look at the ellipticals. How are they different?"

4. The Brain: Machine Learning (The Detective)

They didn't just look at the measurements; they used a powerful AI tool called LightGBM (a type of machine learning).

  • Think of this AI as a master detective. It looks at all the fingerprints (Concentration, Gini, Entropy, etc.) at once.
  • It learned that Entropy (how spread out the light is) and Gini (how unequal the light distribution is) were the biggest clues.
  • The Result: The AI became incredibly good at guessing. It could look at a galaxy it had never seen before and say, "I am 98% sure this is a spiral," or "I am 99% sure this is an elliptical."

5. The Big Discovery: What Works Best?

The paper found some interesting things about which "fingerprints" are the most reliable:

  • The "Concentration" metric is great for telling the difference between a smooth blob (elliptical) and a disk (spiral).
  • The "Asymmetry" metrics are actually terrible at telling spirals from ellipticals (because both can be symmetrical), but they are amazing at spotting messy galaxies, like those that are crashing into each other.
  • The "MEGG" metrics (a fancy name for a group of measurements including Gini and Entropy) were the real stars of the show. They provided the clearest separation between the two types.

6. The Gift to the World

The team didn't just keep this to themselves. They released:

  1. The Code (galmex): A free, open-source tool that anyone can use to measure galaxies. It's modular, meaning you can tweak every step of the process, like adjusting the focus on a camera.
  2. The Catalog: A massive list of over 1.7 million galaxies with their "fingerprints" and a probability score telling you how likely they are to be a spiral or an elliptical.

Why Does This Matter?

Imagine you are studying how cities evolve. You need to know which buildings are old and which are new, and how they change over time.

  • By having a reliable, automated way to sort millions of galaxies, astronomers can now study how galaxies grow, how they crash into each other, and how they change from "messy" to "smooth" over billions of years.
  • This is especially important for the southern hemisphere, where fewer surveys exist. This paper fills in the missing pieces of the cosmic map.

In a nutshell: The authors built a smart, automated system that measures the "texture" of galaxy light. By teaching it with human-voted examples, they created a highly accurate tool that can sort millions of galaxies into spirals and ellipticals, helping us understand the life story of the universe's most beautiful structures.