A quantum chemistry dataset containing ground-state and conical-intersection structures of 260k molecules

This paper introduces a comprehensive quantum chemistry dataset comprising ground-state and conical-intersection structures for 260,000 small molecules, calculated at the OM2/MRCI level, to facilitate the integration of photochemistry with machine learning for studying excited-state reaction processes.

Original authors: Jiahui Zhang, Yifei Zhu, Chuqiao Feng, Yingjin Ma, Chao Xu, Zhenggang Lan

Published 2026-05-15
📖 4 min read☕ Coffee break read

Original authors: Jiahui Zhang, Yifei Zhu, Chuqiao Feng, Yingjin Ma, Chao Xu, Zhenggang Lan

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine the world of molecules as a vast, hilly landscape. When a molecule absorbs light (like sunlight), it doesn't just sit still; it jumps up a hill into an "excited state." Usually, it wants to slide back down to its comfortable, resting spot (the ground state).

However, sometimes the landscape has a very special, tricky spot called a Conical Intersection (CI). Think of a CI as a magical funnel or a crossroads where two different hills merge into a single point. If a molecule rolls into this funnel, it can instantly switch tracks, changing its behavior completely. This is how things like photosynthesis work, how our eyes see light, or how some molecules protect themselves from getting damaged by the sun.

For a long time, scientists have been trying to map these funnels, but they've only been able to draw a few maps for specific, small towns. They couldn't build a global atlas because calculating these funnels is incredibly hard and slow.

What this paper does:
The researchers have built a massive digital atlas containing 260,000 different molecular "towns." For every single one, they mapped out:

  1. The comfortable resting spot (the ground state).
  2. The magical funnel where the tracks cross (the conical intersection).

How they built it:
To make this atlas, they used a clever shortcut. Imagine trying to draw a map of the entire world. If you tried to measure every single tree and rock with a laser (which is what "high-level" science usually does), it would take forever. Instead, these scientists used a "quick sketch" method (called OM2/MRCI). It's like using a fast, reliable drone to take photos of the landscape. It's not perfect down to the millimeter, but it's accurate enough to see the shape of the hills and where the funnels are. This speed allowed them to process a quarter of a million molecules.

The "Quality Control" Check:
Before publishing the atlas, they had to clean it up, just like a librarian organizing books:

  • The "Broken Map" Check: Sometimes, when they tried to find the funnel, the molecule would fall apart (like a Lego castle collapsing). These broken pieces were thrown out because they aren't useful funnels; they're just debris.
  • The "Wrong Address" Check: Sometimes, the math got confused and found a spot that looked like a funnel but was actually lower than the ground level (which is physically impossible). These were also removed.
  • The Result: After throwing out the broken or confusing maps, they were left with a clean, usable dataset of about 260,000 molecules.

What's inside the dataset?
The dataset is like a giant library of molecular blueprints. It includes:

  • The Shapes: The exact 3D coordinates of the atoms for both the resting state and the funnel state.
  • The Energy: How much energy it takes to get to these spots.
  • The Variety: The molecules are diverse. Some are simple chains, some are rings (like bicycle wheels), and some are complex fused structures. They are made of Carbon, Nitrogen, Oxygen, and Fluorine.

Why is this useful?
The authors say this dataset is a training ground for Artificial Intelligence (AI).
Think of it this way: If you want to teach a robot to recognize a funnel in a landscape, you can't just show it one picture. You need to show it millions of examples. This dataset provides those millions of examples. Now, AI can learn the patterns of where these funnels usually appear, helping scientists predict how new molecules might behave without having to do the slow, expensive calculations for every single one.

Important Note:
The authors are very clear: This is a qualitative tool. It's like a weather forecast that tells you "it might rain" or "it's sunny," which is great for planning a picnic or training a model. But if you need to build a skyscraper (a precise medical drug or a specific industrial chemical), you still need the "laser measurement" (high-level calculations) to get the exact details. This dataset is the map that guides you to the right neighborhood, not the blueprint for the house itself.

In short:
They built a massive, high-speed map of 260,000 molecular landscapes, highlighting the tricky "funnels" where chemical reactions happen. They cleaned the map, checked the details, and made it available so that AI can learn to predict these reactions faster than ever before.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →