Pushing the limits of one-dimensional NMR spectroscopy for automated structure elucidation using artificial intelligence

This paper presents a deep learning framework based on transformer architecture that successfully achieves automated de novo structure elucidation for organic molecules with up to 40 non-hydrogen atoms using only one-dimensional 1^1H and 13^{13}C NMR spectra, correctly identifying the target molecule within the top 15 predictions in 60.4% of cases.

Original authors: Frank Hu, Jonathan M. Tubb, Dimitris Argyropoulos, Sergey Golotvin, Mikhail Elyashberg, Grant M. Rotskoff, Matthew W. Kanan, Thomas E. Markland

Published 2026-06-10
📖 4 min read☕ Coffee break read

Original authors: Frank Hu, Jonathan M. Tubb, Dimitris Argyropoulos, Sergey Golotvin, Mikhail Elyashberg, Grant M. Rotskoff, Matthew W. Kanan, Thomas E. Markland

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are a detective trying to solve a mystery, but instead of finding fingerprints or a witness, you only have a single, blurry photograph of the suspect's shadow. Your job is to reconstruct the suspect's entire face, body, and clothing just from that one shadow.

This is essentially what chemists face when they try to figure out the structure of a new molecule using only 1D NMR spectroscopy.

The Impossible Puzzle

In the world of chemistry, a molecule is like a complex Lego structure. For a medium-sized molecule (one with about 36 to 40 "heavy" atoms like carbon, nitrogen, or oxygen), there are more possible ways to snap those Legos together than there are grains of sand on all the beaches on Earth. The paper estimates this number to be between 102010^{20} and 106010^{60}.

Traditionally, figuring out which specific Lego structure you have using only a simple 1D NMR "shadow" (a spectrum) was considered impossible. It's like trying to guess the exact arrangement of a billion Lego bricks just by looking at a single, flat shadow. Usually, chemists need more clues, like 2D NMR (which gives a 3D map) or knowing the exact list of ingredients (the molecular formula) to solve the puzzle.

The AI Detective

The researchers in this paper built a super-smart AI detective (a "Transformer" model, the same type of technology behind many modern chatbots) that can solve this puzzle using only the 1D NMR shadow.

Here is how they trained it, using a clever two-step process:

Step 1: Learning the Language of Shapes (Pre-training)
Before the AI could look at the NMR shadows, they taught it a different game. They gave it "Morgan fingerprints"—which are like digital barcodes that describe the small pieces (fragments) of a molecule—and asked the AI to build the full Lego structure from those barcodes.

  • The Analogy: Imagine teaching a child to build a house by showing them a list of bricks (windows, doors, walls) and asking them to assemble the house.
  • The Result: The AI became a master builder. It could look at a list of fragments and correctly reconstruct the full house 97.8% of the time.

Step 2: The Real Test (Spectrum to Structure)
Once the AI was a master builder, they taught it the real task: looking at the NMR "shadow" and guessing the Lego structure directly.

  • They didn't give it the list of ingredients (the molecular formula).
  • They didn't give it a 3D map.
  • They only gave it the 1D NMR spectrum.

The Results: Solving the Unsolvables

The AI performed miracles on this impossible task:

  • Accuracy: For molecules up to 40 atoms long, the AI guessed the correct structure within its top 15 guesses about 60% of the time.
  • The "Shadow" vs. The "Map": Even if the AI didn't get the exact right answer, it was usually very close. If it guessed wrong, the structure it suggested was often 82% similar to the real molecule. It's like the detective guessing the suspect is wearing a red hat instead of a blue one, but getting the rest of the outfit right.
  • One Eye is Enough: Surprisingly, the AI could do most of this work using only the Hydrogen (1H) NMR spectrum, without needing the Carbon (13C) data. It still got the right answer 46.6% of the time in its top 15 guesses.
  • Real-World Adaptability: The AI was trained on computer simulations, but the researchers showed it could be "fine-tuned" with just 50 real-world experimental spectra. Even with this tiny amount of real data, it jumped from 0% accuracy on real data to 21.5% accuracy.

Why This Matters

Think of the chemical space as a library with 106010^{60} books. Finding the one specific book you need by reading just the cover (the 1D NMR spectrum) was thought to be impossible. This AI doesn't just find the book; it narrows the search down to a small stack of 15 books, 6 out of which are likely the one you want.

The paper concludes that this tool allows scientists to skip the expensive, time-consuming steps of getting more complex data. It acts as a powerful filter, rapidly narrowing down the infinite possibilities of chemical structures to a manageable few, all based on the simplest, most common data available in a chemistry lab.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →