M-CODE: Materials Categorization via Ontology, Dimensionality and Evolution

This paper introduces M-CODE, an open-source, ontology-based categorization system that classifies materials structures by dimensionality, complexity, and evolutionary provenance to support standardized data management and reproducible dataset generation in AI-driven materials science.

Original authors: Vsevolod Biryukov, Kamal Choudhary, Timur Bazhirov

Published 2026-02-17
📖 4 min read☕ Coffee break read

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to build a massive library of LEGO structures.

For a long time, scientists building materials for computers have mostly been building perfect, idealized castles. They have blueprints for perfect cubes and flawless towers. But in the real world, materials aren't perfect castles. They are cracked walls, half-built bridges, surfaces covered in dust, and complex interfaces where two different materials meet.

The problem is that while everyone agrees on what a "perfect castle" looks like, nobody agrees on how to describe a "cracked wall with a missing brick." One scientist calls it a "defect," another calls it a "void," and a third calls it a "surface reconstruction." Because they use different words and different blueprints, their data doesn't talk to each other. It's like trying to build a city where one person speaks English, another speaks French, and a third speaks a language made entirely of hand gestures.

Enter M-CODE.

Think of M-CODE as a universal translator and a standardized LEGO instruction manual rolled into one. It's a new system designed to describe any material structure, no matter how messy or complex, in a way that both humans and computers can understand perfectly.

Here is how it works, broken down into simple concepts:

1. The "Lego Brick" Philosophy (Ontology)

Instead of trying to describe a whole complex machine in one giant paragraph, M-CODE breaks everything down into building blocks (called Entities) and actions (called Operations).

  • The Bricks (Entities): These are the basic ingredients. You have a "Crystal" (a perfect block), a "Vacuum" (empty space), a "Vacancy" (a missing brick), or a "Slab" (a flat layer).
  • The Actions (Operations): These are the instructions on what to do with the bricks. You can "Stack" them, "Stretch" them, "Merge" them, or "Cut" them.

The Analogy:
Imagine you want to describe a sandwich.

  • Old Way: "A messy sandwich with a bite taken out of the corner, sitting on a slightly squished plate." (Hard to reproduce exactly).
  • M-CODE Way:
    1. Take a Bread Entity.
    2. Take a Cheese Entity.
    3. Stack them.
    4. Cut a triangle out of the corner (Operation).
    5. Result: A perfect, reproducible description of that specific sandwich.

2. The "Evolution" Story (Provenance)

One of the coolest parts of M-CODE is that it doesn't just tell you what the material is; it tells you how it was made. It keeps a "receipt" or a "birth certificate" for every structure.

If you have a defective crystal, M-CODE records the history: "We started with a perfect crystal, then we removed an atom here, then we stretched it by 5%."

The Analogy:
Think of it like a cooking recipe that includes a video of the chef making the dish. If the cake tastes weird, you don't just know it's a "bad cake." You can look at the recipe and see, "Ah, the chef forgot to add the baking soda before mixing the eggs." This allows scientists to fix mistakes and reproduce the exact same "mistake" later if they need to study it.

3. The "Zip Code" System (Tags)

To make things easy to find, M-CODE gives every type of structure a short, catchy code, like a zip code.

  • P-2D-SLB-S might mean: Pristine (perfect) 2D (flat) Slab, Simple version.
  • D-0D-VAC might mean: Defective 0D (point) VACancy.

This allows computers to instantly sort materials. If a scientist wants to study only "twisted interfaces," they can just ask the computer to find all items with the "Twisted Interface" tag, and it will instantly find them, regardless of which database they are in.

4. Why Does This Matter? (The "AI" Connection)

Artificial Intelligence (AI) is getting very good at predicting how materials will behave. But AI is only as smart as the data it is fed.

  • If you feed an AI only perfect castles, it will think the world is made of perfect castles.
  • If you feed it M-CODE data, the AI learns about cracks, interfaces, and defects.

M-CODE helps organize this messy, real-world data so AI can learn to predict how real materials (like the battery in your phone or the solar panel on your roof) actually work, rather than just how perfect theoretical models work.

Summary

M-CODE is a new language for materials science.

  • It stops scientists from speaking different dialects.
  • It breaks complex materials down into simple, reusable LEGO-like blocks.
  • It keeps a detailed diary of how every material was built.
  • It gives everything a unique "zip code" so computers can find and sort them instantly.

By doing this, it helps scientists build better AI, create more realistic simulations, and ultimately discover new materials faster. It turns the chaotic mess of real-world materials into an organized, searchable, and reproducible library.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →