A wrong ground-state structure of HfO$_2$ predicted by… — Plain-Language Explanation

Original authors: Shuqi Tang, Jinchen Wei, Kang Wang, Junjie Zhou, Yihan Zhang, Menglin Huang, Shiyou Chen

Published 2026-06-12

📖 4 min read☕ Coffee break read

Original authors: Shuqi Tang, Jinchen Wei, Kang Wang, Junjie Zhou, Yihan Zhang, Menglin Huang, Shiyou Chen

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to build a perfect map of a mountainous region to help hikers find the lowest valley (the "ground state"). In the world of materials science, this valley represents the most stable, natural shape a material like Hafnium Oxide (HfO₂) wants to take.

For a long time, scientists have used a powerful tool called Machine-Learning Interatomic Potentials (MLIPs). Think of these MLIPs as super-smart GPS systems. They are trained by feeding them data from a "teacher" called Density Functional Theory (DFT). The most popular "teacher" text used to train these GPS systems is a specific set of rules called the PBE functional.

Here is the story of what the paper found:

1. The GPS Got the Map Wrong

The researchers asked their GPS system (the MLIP trained on PBE data) to find the lowest valley for HfO₂.

What the GPS said: "The lowest valley is a place called I4₁/amd. It's a low-density, spacious structure where the atoms are arranged in a specific octahedral pattern (like a box with six sides)."
What reality says: "No, the lowest valley is actually the monoclinic P2₁/c structure. This is what experiments in the real world clearly show."

The GPS was confidently pointing to the wrong destination. It claimed the "spacious" I4₁/amd structure was 17 units more stable than the real winner.

2. Is the GPS Broken, or is the Teacher Lying?

The researchers wondered: Did we build the GPS wrong, or is the teacher (PBE) giving bad homework?

They tested this by:

Checking other famous, pre-made GPS models (like NequIP and MatterSim). Result: They all pointed to the same wrong "I4₁/amd" valley.
Comparing the GPS predictions directly against the teacher's raw data. Result: The GPS was actually doing its job perfectly; it was just faithfully copying the teacher's mistakes.

The Verdict: The GPS wasn't broken. The PBE teacher was the problem.

3. The "Loose Clothing" Analogy

Why did the PBE teacher make this mistake?
Imagine the PBE functional is like a tailor who loves loose, baggy clothing.

The "I4₁/amd" and "Pbcn" structures are like loose, spacious outfits (low-density, large volumes).
The "P2₁/c" structure is like a tighter, more compact outfit.

The PBE tailor has a bias: it thinks loose, spacious clothes are more comfortable (lower energy) than they actually are. Because of this bias, the PBE teacher told the GPS that the spacious "I4₁/amd" outfit was the best one, even though in reality, the tighter "P2₁/c" outfit is what the material prefers.

When the researchers tried other "tailors" (functionals like PBEsol or LDA), who prefer tighter, more compact fits, the map corrected itself. Suddenly, the "I4₁/amd" outfit looked too baggy and expensive, and the "P2₁/c" structure returned to being the true champion.

4. The Hiker's Journey (Ferroelectric Switching)

The paper also looked at what happens when HfO₂ changes its shape (like a hiker switching paths).

Scenario A (Fixed Lattice): If you force the hiker to stay on a rigid path (no changing the size of the map), both the "loose" PBE teacher and the "tight" PBEsol teacher give similar directions.
Scenario B (Relaxed Lattice): If you let the hiker change the size of the path (allowing the map to expand or contract), the two teachers give wildly different directions.
- The PBE teacher (loose bias) says: "Take the path through the spacious Pbcn valley because it looks easy and roomy."
- The PBEsol teacher (compact bias) says: "No, that path is too wide and unstable. Take the tighter, more direct route."

Because the PBE teacher overestimates how comfortable the "spacious" paths are, it leads the simulation down a completely different road than what would actually happen in the real world.

The Big Lesson

The main takeaway is a warning for anyone using these high-tech GPS systems (MLIPs):

Just because a machine learning model is incredibly accurate at copying its training data doesn't mean it's telling the truth. If the "teacher" (the DFT functional) has a built-in bias (like loving loose clothes), the student (the MLIP) will learn that bias perfectly and confidently predict the wrong answer.

To get a reliable map of the material world, you can't just trust the machine learning model; you have to make sure the teacher it learned from is using the right set of rules.

Technical Summary: A Wrong Ground-State Structure of HfO₂ Predicted by Machine-Learning Interatomic Potentials Based on the PBE Functional

Problem Statement
Machine-learning interatomic potentials (MLIPs) have become essential tools for large-scale materials simulations, offering near-first-principles accuracy at a fraction of the computational cost of density functional theory (DFT). However, the predictive reliability of MLIPs is intrinsically tied to the quality of their training datasets, which are predominantly generated using the Perdew–Burke–Ernzerhof (PBE) generalized gradient approximation (GGA) functional. While PBE is widely adopted for its efficiency and stability, its ability to accurately describe the subtle energy differences between competing polymorphs in complex systems like hafnium oxide (HfO₂) remains a critical question. HfO₂ exhibits strong polymorphic competition (including monoclinic, orthorhombic, and tetragonal phases) and high sensitivity to external fields and strain. The central problem addressed in this work is whether PBE-based MLIPs can reliably capture the correct ground-state structure and energy landscapes of HfO₂, or if inherent errors in the PBE functional are being propagated and amplified by the machine learning models.

Methodology
The authors employed a multi-faceted approach to investigate the reliability of PBE-based models for HfO₂:

MLIP Training and Global Search: A specific MLIP based on the Allegro architecture was trained on a dataset generated via ab initio molecular dynamics (AIMD) using the PBE functional (VASP). This potential was used to conduct global structural searches using the CALYPSO software to identify the lowest-energy structures.
Benchmarking Against Foundation Models: To determine if the findings were specific to their custom model, the authors performed structural searches using several widely used, pre-trained PBE-based foundation models, including NequIP-OAM-L, MatterSim-v1-5M, and MACE-MP-0.
Functional Comparison: The authors calculated the relative energies of various HfO₂ crystal structures (including the monoclinic $P2_1/c$ , tetragonal $P4_2/nmc$ , orthorhombic $Pca2_1$ , and the newly identified $I4_1/amd$ phases) using a range of exchange-correlation functionals: PBE, PBE-vdW, SCAN, PBEsol, and the local density approximation (LDA).
Polarization Switching Analysis: To assess the impact of functional bias on dynamic processes, the authors calculated polarization switching pathways for ferroelectric orthorhombic HfO₂ ( $Pca2_1$ ) using both PBE and PBEsol. These calculations were performed under both fixed-lattice and relaxed-lattice conditions using the Nudged Elastic Band (NEB) and Generalized Solid-State NEB (GSSNEB) methods.

Key Results

Identification of an Incorrect Ground State: Both the custom-trained PBE-based MLIP and multiple public foundation models (NequIP-OAM-L, MatterSim-v1-5M) incorrectly predicted a low-energy $I4_1/amd$ structure as the global minimum for HfO₂. This structure, which resembles rutile TiO₂ and features sixfold Hf–O octahedral units, was found to be approximately 17 meV/f.u. lower in energy than the experimentally verified monoclinic $P2_1/c$ ground state.
Origin of the Error: Comparative DFT calculations confirmed that the MLIPs faithfully reproduced the PBE-DFT results, proving the error was not a machine-learning fitting artifact but an intrinsic flaw in the PBE functional itself. The $I4_1/amd$ phase emerged as the lowest-energy structure only under the PBE functional. When other functionals (PBE-vdW, SCAN, PBEsol, LDA) were used, the $I4_1/amd$ phase became significantly less stable, eventually becoming the highest-energy phase under LDA.
Structural Sensitivity: The error was traced to PBE's tendency to overstabilize low-density structures with large equilibrium volumes and specific coordination environments (sixfold Hf, threefold O). Functionals that favor more compact structures, such as PBEsol and LDA, penalize these low-density configurations.
Impact on Ferroelectric Switching: The functional bias significantly altered the energy landscapes for polarization switching when lattice relaxation was allowed. Under fixed-lattice conditions, PBE and PBEsol yielded similar barriers. However, with lattice relaxation, PBE predicted a distinct $Pbcn$-like intermediate state with a lower barrier, whereas PBEsol maintained a conventional tetragonal-like transition state. This occurred because PBE's energy landscape makes the $Pbcn$ phase a competitive low-energy basin, while PBEsol places it at a much higher energy.

Key Contributions

Discovery of a Systematic MLIP Failure: The study reveals a previously unreported, spurious ground-state prediction ( $I4_1/amd$ ) common across multiple state-of-the-art PBE-based MLIPs for HfO₂.
Attribution to Exchange-Correlation Functionals: The work definitively demonstrates that errors in MLIP predictions of crystal structures can originate directly from the exchange-correlation functional used to generate training data, rather than from the machine learning architecture or fitting process.
Functional-Dependent Energy Landscapes: The research highlights that the choice of functional fundamentally changes the topology of the potential energy surface, particularly for processes involving large lattice relaxations, such as phase transitions and ferroelectric switching.
Validation of Alternative Functionals: The study shows that the error can be largely suppressed by using alternative functionals like PBEsol and LDA, which correctly predict the monoclinic $P2_1/c$ structure as the ground state.

Significance and Claims
The authors frame this study as a critical warning to the materials modeling community. They assert that while MLIPs are powerful, their reliability cannot be judged solely by their ability to reproduce reference DFT data. If the underlying DFT functional contains systematic biases (such as PBE's overestimation of volumes for certain coordination environments), the MLIP will faithfully reproduce these errors, leading to physically incorrect predictions of ground states and phase transition pathways. The paper emphasizes the necessity of carefully evaluating the physical validity of the exchange-correlation functional used for training, especially when simulating systems with complex polymorphic competition and lattice relaxation. The findings suggest that for HfO₂ and similar systems, reliance on PBE-trained foundation models without functional validation may lead to misleading conclusions regarding structural stability and switching mechanisms.

A wrong ground-state structure of HfO2_22​ predicted by machine-learning interatomic potentials based on the PBE functional