MC3D: The Materials Cloud computational database of experimentally known stoichiometric inorganics

This paper introduces MC3D, a comprehensive online database on the Materials Cloud portal containing experimentally known stoichiometric inorganic crystal structures with fully reproducible, DFT-optimized geometries derived from automated workflows and curated input protocols.

Original authors: Sebastiaan P. Huber, Michail Minotakis, Marnik Bercx, Timo Reents, Kristjan Eimre, Nataliya Paulish, Nicolas Hörmann, Martin Uhrin, Nicola Marzari, Giovanni Pizzi

Published 2026-03-30
📖 4 min read☕ Coffee break read

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are a chef trying to invent the perfect new dish. You have a massive library of recipes from around the world, but they are written in different languages, some have missing ingredients, and some are just theoretical ideas that no one has ever actually cooked.

MC3D is like a super-organized, high-tech kitchen that takes all those messy, scattered recipes and turns them into a single, reliable "Master Cookbook" of real, edible dishes.

Here is the story of how they built it, explained simply:

1. The Great Recipe Hunt (Data Collection)

The scientists started by gathering nearly one million recipes (crystal structures) from three giant libraries: the COD, ICSD, and MPDS.

  • The Problem: These libraries were messy. Some recipes had typos, some were for dishes that don't exist in reality (like a cake made of pure air), and some were just theoretical sketches.
  • The Cleanup: They ran these recipes through a strict "quality control" machine. They threw out the ones with bad syntax, the ones that weren't balanced (stoichiometric), and the ones that were just molecular clouds (like water vapor) rather than solid materials.
  • The Result: From that million, they kept only 72,589 unique, solid, real-world recipes. This is their "Source Collection."

2. The Simulation Kitchen (DFT Optimization)

Now, they had the recipes, but they needed to test them. In the real world, you have to bake a cake to see if it rises. In the computer world, they use a method called Density-Functional Theory (DFT). Think of this as a "virtual oven."

  • The Challenge: Running a virtual oven is expensive and slow. If you put a complex cake (a structure with many atoms) in, it might take days to bake, or the oven might break (crash).
  • The Solution: They built an automated robot chef (an automated workflow).
    • The robot puts the ingredients in the oven.
    • If the oven starts smoking (an error), the robot doesn't panic. It checks the smoke, adjusts the temperature, and tries again.
    • If it fails too many times, it gives up and moves to the next recipe.
  • The Success Rate: This robot was incredibly good. It successfully "baked" 85% of the recipes it tried, even when the original recipes were tricky.

3. The Master Cookbook (The MC3D Database)

The result is the MC3D (Materials Cloud 3D Structure Database).

  • It contains 32,013 perfectly "baked" structures.
  • Why is this special? Most other databases are like a pile of unsorted notes. MC3D is a curated, consistent collection. Every single entry was cooked using the exact same "recipe" (computational protocol). This makes it perfect for training AI. If you want to teach a computer to predict new materials, you need a clean, consistent dataset, not a messy pile of notes.

4. The Open Kitchen (Accessibility)

The best part? They didn't lock the cookbook in a vault.

  • They put it on the Materials Cloud, a public website.
  • The Interface: Imagine a website where you can type in "I want a material with Iron and Oxygen," and it instantly shows you a list of 3D models you can spin around, rotate, and inspect.
  • The "Receipt": For every single calculation, they kept a full "receipt" (provenance). If you want to know exactly how they got a result, you can click a button and see the entire history: which computer they used, what settings they tweaked, and how they fixed errors. This makes the science completely transparent and reproducible.

Why Does This Matter?

Think of Machine Learning (AI) in materials science as a student trying to learn chemistry.

  • If you give the student a messy pile of contradictory notes, they will learn the wrong things.
  • If you give them the MC3D, you are giving them a clean, verified textbook.

This database helps scientists:

  1. Find new materials faster: Instead of guessing, they can search this database for materials that might be super-strong, super-conductive, or great for batteries.
  2. Train better AI: The consistency of the data helps computers learn the rules of chemistry more accurately.
  3. Save time: Researchers don't have to waste time cleaning up messy data; they can just start their experiments.

In short: MC3D is a massive, clean, and open library of "digital crystals" that has been rigorously tested by a robot chef, making it the perfect foundation for discovering the next generation of materials that will power our future.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →