🔬 materials science

On The Finetuning of MLIPs Through the Lens of Iterated Maps With BPTT

This paper proposes a robust, end-to-end differentiable fine-tuning method for pretrained machine-learning interatomic potentials that optimizes predicted structures by unrolling relaxation trajectories and backpropagating gradients, resulting in a consistent ~32% reduction in prediction error across various models and hyperparameter settings.

Original authors: Evan Dramko, Yizhi Zhu, Aleksandar Krivokapic, Geoffroy Hautier, Thomas Reps, Christopher Jermaine, Anastasios Kyrillidis

Published 2026-02-03

📖 4 min read☕ Coffee break read

CC BY 4.0

Original authors: Evan Dramko, Yizhi Zhu, Aleksandar Krivokapic, Geoffroy Hautier, Thomas Reps, Christopher Jermaine, Anastasios Kyrillidis

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Picture: Fixing the "Map" vs. Fixing the "Hiker"

Imagine you are trying to find the lowest point in a vast, foggy mountain valley (this represents the most stable, energy-efficient shape of a material).

The Problem: To find the bottom, you usually need a very expensive, high-tech drone (called DFT or "first-principles calculations") to scan the terrain and tell you exactly which way is down. But flying this drone is so slow and costly that you can't use it for every single step of your journey.
The Current Solution: Scientists built a "smart hiker" (called an MLIP or Machine Learning Interatomic Potential). This hiker has studied thousands of drone scans and learned to guess which way is down. Usually, the hiker is pretty good at guessing the direction of the slope at any single moment.
The Catch: Even if the hiker guesses the direction correctly 99% of the time, those tiny errors add up over a long hike. By the time the hiker thinks they've reached the bottom, they might actually be stuck in a small dip on a hillside, far from the true valley floor.

The Paper's Idea: Learning from the Destination

The authors of this paper asked a new question: Instead of just teaching the hiker to guess the slope perfectly at every single step, what if we taught them to focus on actually reaching the bottom?

They developed a new training method called BPTT (Backpropagation Through Time). Here is how it works, using a creative analogy:

The Analogy: The "Rehearsal" vs. The "Final Performance"

Old Way (Traditional Training): Imagine a dance instructor teaching a student. The instructor watches every single step the student takes. If the student's foot is 1 inch off the beat, the instructor yells, "Fix that step!" The student learns to be perfect at every individual move, but they might still stumble at the end of the routine because the small mistakes piled up.
New Way (This Paper's Method): The instructor lets the student run through the entire dance routine from start to finish without stopping. The instructor only looks at the final pose.
- If the student ends up in the wrong spot, the instructor says, "The whole routine was off."
- The instructor then rewinds the tape (mathematically) and adjusts the student's muscle memory for the entire dance, not just the specific steps that were wrong.
- The goal isn't to make every step perfect; the goal is to make sure the final result is perfect.

What They Found

When they applied this "rehearsal" method to their AI models:

Better Results: The models became much better at finding the true "bottom of the valley" (the correct atomic structure). On average, they reduced errors by about 32%.
The Paradox: Here is the strange part. When they checked the models' ability to guess the slope at any single moment, the models actually got worse. They were less accurate at predicting the immediate forces.
- Why? The model learned to "cheat" slightly. It stopped trying to be a perfect map of the terrain at every single point. Instead, it learned a "shortcut" or a bias that steered the hiker toward the right destination, even if the path looked a little weird along the way.
Robustness: It didn't matter if they changed the rules of the hike (like how big of a step the hiker took). The method worked consistently well across different types of materials and different AI architectures.

The Key Takeaway

The paper argues that for designing new materials, being perfect at every step is less important than getting the final destination right.

By treating the entire relaxation process as one giant, connected loop and training the AI based on the final outcome, they created a system that is much more reliable at predicting stable structures, even though it is technically "less accurate" at predicting the physics of a single instant.

In short: They stopped teaching the AI to be a perfect navigator of the terrain and started teaching it to be a master of the destination.

Technical Summary: Fine-Tuning MLIPs Through the Lens of Iterated Maps With BPTT

Problem Statement
Accurate structural relaxation—the process of finding atomic configurations corresponding to local minima on the potential energy surface (PES)—is a bottleneck in computational materials science. Traditional methods rely on Density Functional Theory (DFT) to compute interatomic forces, which are computationally expensive and scale steeply with system size. Machine Learning Interatomic Potentials (MLIPs) have emerged as efficient surrogates to approximate DFT forces, typically used within iterative optimization loops to emulate relaxation. However, a fundamental challenge in MLIP development is data scarcity; generating new training examples requires costly first-principles calculations. Consequently, simply scaling datasets is often impractical. Furthermore, conventional MLIP training optimizes per-step force accuracy independently, ignoring how errors accumulate during the relaxation trajectory, often leading to significant deviations in the final predicted structures.

Methodology
The authors propose a fine-tuning framework that treats structural relaxation as a fully differentiable, end-to-end simulation loop. Instead of training MLIPs solely on static structure-force pairs, the method unrolls full relaxation trajectories and applies Backpropagation Through Time (BPTT).

Key components of the methodology include:

Trajectory-Level Training: The relaxation process is modeled as a sequence of "frames," where each frame consists of a force prediction by the MLIP followed by a structural update step. The entire trajectory is unrolled, and gradients are tracked through the sequence to update model parameters based on the quality of the final relaxed structure, rather than intermediate force errors.
Loss Function: The optimization objective is the "Delta Q" ( $D_q$ ), a mass-weighted displacement metric between the predicted final structure and the ground-truth relaxed structure. This metric is preferred over Mean Squared Error (MSE) in defect cases to avoid overemphasizing bulk lattice errors.
Iterative Maps and Proxy Functions: The authors interpret the relaxation step as an iterative map. The BPTT procedure fine-tunes the MLIP to act as a proxy function that approximates the contraction dynamics of the PES, learning to preserve the locations of fixed points (stable structures) and their basins of attraction, even if local force accuracy is slightly compromised.
Step Size Control: The study investigates whether the step size ( $\eta$ ) in the gradient descent should be fixed, learned as a scalar, or predicted by a neural network. Experiments indicate that a fixed or scalar learned step size is sufficient, and the primary performance gains come from modifying the MLIP weights themselves to align with the descent procedure.

Key Contributions

BPTT-Based Fine-Tuning Framework: Introduction of a full-trajectory fine-tuning method for pretrained MLIPs that optimizes the outcome of the relaxation process directly.
Ablation and Analysis: Comprehensive analysis of PES-level optimization components, demonstrating that the method is robust to variations in hyperparameters and procedural modifications (e.g., step size initialization, trajectory length).
Theoretical Connection: Linking BPTT-based training to the theory of iterative maps and proxy functions, suggesting that the method learns a simplified contraction of the true DFT-driven dynamics tailored to specific structural manifolds.
Generalizability Validation: Validation across multiple structural domains (silicon defects, pure crystals, catalysts) and architectures (ADAPT, ResMLP), showing consistent performance improvements.

Results
The proposed method consistently improves the accuracy of relaxed structures across all evaluated pretrained models:

Performance Gains: The approach yields an average reduction of approximately 32% in prediction error ( $D_q$ ) across datasets. In specific cases, such as silicon defects, the error reduction reaches roughly 50% compared to untuned baselines.
Paradoxical Accuracy: A notable finding is that BPTT fine-tuning often degrades the raw force prediction accuracy (L2 force errors increase) while simultaneously improving the final structural accuracy. This suggests the model learns a structural bias that prioritizes the correct endpoint over local force fidelity.
Robustness: The method achieves negligibly different results across varied hyperparameter settings and is robust to non-optimal step size initializations.
Architecture Independence: Improvements were observed in both the ADAPT (Transformer-based, graph-free) and ResMLP architectures, indicating the strategy is not limited to a specific model type.

Significance and Claims
The paper claims that this approach offers a pragmatic solution to the data scarcity bottleneck in MLIP development. By extracting more value from existing data through trajectory-level supervision, it allows for the creation of highly effective, domain-specific MLIPs without requiring additional expensive first-principles data.

The authors position BPTT not as a method to "solve the physics" or recover universal physical dynamics, but as a final stage in a staged training pipeline. It refines a broadly applicable, pretrained MLIP to perform reliably on specific structural classes by learning a contraction map that steers trajectories toward correct metastable states. This is particularly valuable for high-throughput workflows where improved relaxation fidelity reduces the need for expensive DFT evaluations. The work draws a parallel to Reinforcement Learning from Human Feedback (RLHF), where sequence-level objectives improve downstream behavior without necessarily minimizing token-level training loss.