Score-based diffusion models for accurate crystal-structure inpainting and reconstruction of hydrogen positions

This paper presents a score-based diffusion model that combines materials science and computer vision techniques to accurately and efficiently reconstruct missing hydrogen atom positions in crystal structures, achieving a success rate exceeding 97% compared to traditional DFT or unconditioned approaches.

Original authors: Timo Reents, Arianna Cantarella, Marnik Bercx, Pietro Bonfà, Giovanni Pizzi

Published 2026-06-17
📖 4 min read☕ Coffee break read

Original authors: Timo Reents, Arianna Cantarella, Marnik Bercx, Pietro Bonfà, Giovanni Pizzi

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a 3D puzzle of a crystal, but someone has taken out all the tiny pieces representing hydrogen atoms. In the real world, finding where these hydrogen pieces belong is incredibly hard. It's like trying to spot a ghost in a room using a flashlight that barely sees anything; standard X-ray cameras often miss them entirely, forcing scientists to use expensive, massive machines (neutron diffraction) to find them. Because of this, many crystal databases are full of "ghosts"—structures where the hydrogen pieces are either missing entirely or just guessed at based on a hunch.

This paper introduces a new "AI detective" that can fill in those missing pieces with high accuracy. Here is how they did it, explained simply:

The Problem: The "Missing Piece" Puzzle

Scientists have a powerful AI tool called MatterGen that is great at creating new crystal puzzles from scratch. However, the researchers wanted to use it for a different job: inpainting.

Think of "inpainting" like the "Healing Brush" tool in photo editing software. If you have an old photo with a scratch or a missing piece, the tool looks at the surrounding pixels and intelligently fills in the gap. The researchers wanted to do this for crystals: look at the known parts of the structure and intelligently fill in the missing hydrogen spots.

The Solution: Borrowing from Art Restoration

The team realized that the best tools for "filling in the blanks" weren't actually in chemistry; they were in computer vision (how computers "see" images). They took a technique called TD-Paint, originally designed to restore damaged images, and taught it to understand crystal structures.

They trained a new version of the AI (called pos-only-TD) with a specific rule:

  • The Knowns: The AI is told, "These atoms are real and fixed; do not touch them."
  • The Unknowns: The AI is told, "These spots are empty; guess what goes here."

Unlike older methods that would guess the whole picture from scratch (which is like trying to redraw a whole painting because you lost one brushstroke), this AI only focuses on filling the holes while respecting the existing structure.

How It Works: The "Denoising" Dance

The AI works like a sculptor starting with a block of noisy, random clay.

  1. Start: It starts with a crystal where the hydrogen spots are just random noise (like static on an old TV).
  2. The Process: Step-by-step, the AI "cleans" the noise. It asks, "Based on the atoms I do know are there, what should the hydrogen atoms look like?"
  3. The Twist: The new method (TD-Paint) is smarter about this. It knows that the "known" atoms are already perfect and shouldn't be "noisy." It only adds noise to the missing spots and cleans those up, making the process much faster and more accurate.

The Results: A 99% Success Rate

The team tested this on a huge library of crystals.

  • The Test: They took crystals with known hydrogen positions, hid the hydrogens, and asked the AI to find them again.
  • The Score: The AI succeeded in finding the exact original structure 97% of the time.
  • The Bonus: In the other cases, the AI didn't just fail; it often found a better version of the crystal—one that is more stable and energetic than the original "guess" found in the database.

When they filtered out the crystals that were known to be just theoretical guesses (not real experiments), the success rate jumped to 99%.

Why This Matters

  • Speed and Cost: Instead of needing expensive neutron machines to find hydrogens, scientists can now use this AI to predict them instantly on a standard computer.
  • Fixing Bad Data: The AI can act as a quality control check. If the AI predicts a hydrogen position that is very different from what's in the database, it might mean the database entry was wrong or a "bad guess" all along.
  • Beyond Hydrogen: While they tested it on hydrogen, the method is "hydrogen-agnostic." This means the same tool could be used to fill in missing pieces for other atoms, like lithium in battery materials, without needing to be retrained from scratch.

In short, the researchers took a tool designed for art restoration, taught it the language of crystals, and gave it the ability to "see" the invisible hydrogen atoms that have been missing from our scientific maps for decades.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →