HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals

This paper introduces Poly2Graph, an automated pipeline for generating HSG-12M, a pioneering 16.7-million-scale dataset of spatial multigraphs derived from non-Hermitian crystal energy spectra, which bridges condensed matter physics and geometry-aware graph learning by preserving vital geometric information often discarded in existing benchmarks.

Xianquan Yan, Hakan Akgün, Kenji Kawaguchi, N. Duane Loh, Ching Hua Lee

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are trying to understand the "personality" of a complex machine, like a quantum crystal, just by looking at the sounds it makes. In the world of physics, these "sounds" are called energy spectra. For a long time, scientists have known that if you plot these sounds on a map, they form beautiful, intricate shapes—loops, spirals, and tangled webs. These shapes are like fingerprints; they tell you everything about how the machine works, how it conducts electricity, or if it has hidden superpowers.

However, there was a huge problem: No one had a map of these fingerprints.

Until now.

This paper introduces HSG-12M, a massive new library containing 12 million of these energy fingerprints. But it's not just a library; it's a revolution in how we use AI to discover new materials. Here is the simple breakdown:

1. The Problem: The "Manual Drawing" Bottleneck

Imagine you are an art critic trying to study the style of a painter. But instead of a gallery, you have to visit every single artist's studio, watch them paint one stroke at a time, and manually trace their work onto a piece of paper. That is what scientists were doing with these quantum crystals. They had to calculate the energy shapes by hand, one by one. It was slow, tedious, and impossible to do for millions of examples. Without a big dataset, AI (which needs mountains of data to learn) was stuck.

2. The Solution: The "Magic Printer" (Poly2Graph)

The authors built a tool called Poly2Graph. Think of this as a high-speed, automated 3D printer for data.

  • How it works: You feed it the mathematical "recipe" of a crystal (its Hamiltonian).
  • What it does: Instead of a human spending hours drawing, Poly2Graph instantly calculates the energy shape and turns it into a digital graph (a network of dots and lines).
  • The Speed: It is 100,000 times faster than the old methods. It turned a job that would take centuries into a job that took a few days.

3. The Dataset: HSG-12M (The "Library of Fingerprints")

Using this magic printer, the team created HSG-12M.

  • Size: It contains 11.6 million static shapes and 5.1 million shapes that show how the patterns change over time.
  • Diversity: It covers 1,401 different types of crystal recipes.
  • The "Secret Sauce" (Spatial Multigraphs): This is the most important part.
    • Most AI graph datasets are like simple subway maps: "Station A connects to Station B." There is only one line.
    • HSG-12M is like a real city map. Between Station A and Station B, there might be a highway, a bike path, a river, and a train track. All these are different connections between the same two points.
    • In physics, these "multiple paths" carry vital information. If you squish them into a single line (like most AI does), you lose the secret geometry. HSG-12M keeps all the paths separate, preserving the full 3D shape of the data.

4. Why This Matters: The "Reverse Engineering" Dream

Why do we care about 12 million weird shapes? Because of Inverse Design.

  • The Old Way: "I have a crystal. Let me calculate what its energy shape looks like." (Forward problem).
  • The New Way (AI): "I want a material that does this specific thing (like conducting electricity with zero resistance). What should the crystal look like?"
    • With HSG-12M, you can show an AI a desired shape (a fingerprint) and ask, "Which crystal recipe makes this?"
    • The AI can then suggest a shortlist of materials for scientists to build. This could lead to super-fast computers, better solar panels, or new medical sensors.

5. The "Universal Translator"

The paper also makes a fascinating claim: Everything is a graph.
The authors show that you can turn not just crystals, but also matrices (math grids) and polynomials (equations) into these same energy fingerprints.

  • Imagine a universal translator that turns a complex math equation into a picture of a tangled knot.
  • If you can turn any math problem into a graph, you can use the same AI tools to solve problems in physics, chemistry, and pure mathematics.

Summary Analogy

Think of the universe as a giant library of books (materials).

  • Before: Scientists could only read a few pages of a few books by hand.
  • Poly2Graph: A robot that instantly reads every book and draws a picture of its story.
  • HSG-12M: A massive gallery of 12 million of these drawings, organized by genre.
  • The Result: Now, if you want to write a story about a "super-material," you can look at the gallery, find the perfect picture, and tell the robot, "Build me the book that matches this picture."

This paper provides the data, the tool, and the theory to finally let AI help us design the materials of the future.