🔬 materials science

Acceleration of Atomistic NEGF: Algorithms, Parallelization, and Machine Learning

This paper summarizes key algorithmic advancements in parallelization and machine learning that have enabled the scaling of accurate, ab-initio Density Functional Theory combined with Non-equilibrium Green's function (DFT+NEGF) simulations from small atomic systems to realistic, large-scale nanoscale devices.

Original authors: Mathieu Luisier, Nicolas Vetsch, Alexander Maeder, Vincent Maillou, Anders Winka, Leonard Deuschle, Chen Hao Xia, Manasa Kaniselvan, Marko Mladenovic, Jiang Cao, Alexandros Nikolaos Ziogas

Published 2026-02-04

📖 5 min read🧠 Deep dive

CC BY 4.0

Original authors: Mathieu Luisier, Nicolas Vetsch, Alexander Maeder, Vincent Maillou, Anders Winka, Leonard Deuschle, Chen Hao Xia, Manasa Kaniselvan, Marko Mladenovic, Jiang Cao, Alexandros Nikolaos Ziogas

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to understand how electricity flows through a tiny, microscopic wire made of silicon—so small that it's only a few thousand atoms wide. To do this accurately, scientists use a complex mathematical tool called NEGF (Non-equilibrium Green's function). Think of NEGF as a super-precise weather forecast for electrons: it predicts how they move, bounce off each other, and interact with vibrations in the material.

However, running these "forecasts" for real-world-sized devices has been like trying to predict the weather for the entire planet using a calculator from the 1980s. It's too slow and the computers crash.

This paper from a team at ETH Zurich describes how they built a "super-calculator" to fix this, using three main tricks: better algorithms, massive teamwork (parallelization), and artificial intelligence.

Here is a breakdown of their work using simple analogies:

1. The Problem: The "Traffic Jam" of Math

In the past, scientists could only simulate tiny systems (a few atoms). To simulate a realistic device (thousands of atoms), the math becomes incredibly heavy.

The Challenge: The equations require solving massive puzzles where every piece depends on every other piece. Doing this one by one takes forever.
The Goal: They wanted to simulate a silicon "nano-ribbon" (a tiny wire) that is actually large enough to be useful, while accounting for electrons bumping into each other (scattering), which is like cars in traffic slowing each other down.

2. The Solution: The "Assembly Line" (Parallelization)

To speed things up, the team didn't just build a faster computer; they changed how the work is done.

The Analogy: Imagine a massive library where you need to find specific books. Instead of having one librarian walk the aisles one by one, they hired 9,400 librarians (computers) to work at the same time.
The Trick: They developed a method called Serinv. Think of the giant math problem as a long, wavy line of blocks. Instead of trying to solve the whole line at once, they chop it into smaller chunks and give each chunk to a different computer.
The Result: They tested this on the Frontier supercomputer (one of the most powerful in the world). They simulated a silicon wire with 25,344 atoms. By using 9,400 computer nodes working together, they achieved 80% efficiency. This means almost all the computers were busy working, not just waiting around.

3. The "Time Travel" Trick (Algorithms)

The math involves two different types of calculations that need data organized differently.

The Analogy: Imagine you are cooking a stew. Sometimes you need to chop all the vegetables first (one way of organizing data), and other times you need to stir the pot for a long time (a different way).
The Fix: The team created a system that can instantly "transpose" or rearrange the data. It's like having a magical kitchen where the vegetables instantly rearrange themselves from a chopping board to a pot depending on what the chef needs next. This allows them to switch between solving linear equations and doing complex energy convolutions without wasting time.

4. The "Crystal Ball" (Machine Learning)

Even with super-fast computers, there is one bottleneck: creating the initial map of the atoms (the Hamiltonian matrix) using a method called DFT (Density Functional Theory).

The Problem: DFT is like drawing a map of a city by measuring every single brick in every building. It is incredibly accurate but takes a huge amount of time and energy, especially for large cities (thousands of atoms).
The Innovation: The team trained an AI (specifically a Graph Neural Network) to act as a "crystal ball."
- They showed the AI a few examples of how atoms arrange themselves in a specific type of memory cell (called a Valence Change Memory or VCM).
- The AI learned the patterns. Now, instead of measuring every brick (running DFT), the AI can instantly predict the map for new configurations of the memory cell.
The Catch: The AI is very fast (scaling linearly with size) and accurate enough to get the general shape right, but it still has a tiny error (about 2 meV). It's like the AI can draw a perfect map of a city's layout, but the street signs might be slightly off. It's not perfect enough yet to replace the human surveyor entirely, but it's a huge step forward.

5. The Results: What Did They Find?

The Silicon Wire: They successfully simulated a silicon wire with electron-electron interactions. They found that when electrons interact, the "gap" in energy (band gap) gets slightly bigger, just as physics predicts.
Current Conservation: They proved their simulation works because the electrical current entering one side of the wire was exactly the same as the current leaving the other side, even with all the complex interactions.
The AI Test: They used their AI to predict how electricity flows through a memory cell. The AI's prediction was very close to the real physics, proving that machine learning can speed up these simulations significantly.

Summary

This paper is about scaling up. The team took a method that was previously limited to tiny, toy-sized models and scaled it up to realistic, industrial-sized devices. They did this by:

Dividing the work among thousands of computers (Parallelization).
Reorganizing the data so the computers don't get stuck (Algorithms).
Teaching an AI to guess the hardest parts of the math, saving time (Machine Learning).

They haven't solved every problem yet (the AI isn't perfect, and some simulations are still too heavy), but they have built the engine that allows scientists to finally simulate realistic quantum devices with high accuracy.

Technical Summary: Acceleration of Atomistic NEGF

Problem Statement
The continuous miniaturization of transistors has brought device dimensions into the regime where ab-initio quantum transport (QT) simulations are necessary. While the Non-equilibrium Green's function (NEGF) formalism combined with Density Functional Theory (DFT) offers a rigorous framework for simulating nanoscale devices, current implementations face significant bottlenecks. Historically, these simulations have been restricted to small systems (few atoms) or the ballistic transport limit. Including complex scattering mechanisms (electron-phonon, electron-electron, electron-photon) and scaling to realistic device sizes (thousands of atoms) remains computationally prohibitive due to the $O(N^3)$ scaling of DFT and the sequential nature of traditional recursive Green's function (RGF) algorithms. Furthermore, the generation of Hamiltonian matrices for large, dynamically evolving structures (e.g., during voltage sweeps) remains a major challenge.

Methodology
The authors address these limitations through a multi-faceted approach involving algorithmic optimization, high-performance parallelization, and the integration of machine learning:

Algorithmic Framework: The core simulation relies on solving NEGF equations for retarded ( $G^R$ ) and lesser/greater ( $G^{<,>}$ ) Green's functions in the presence of scattering self-energies ( $\Sigma^{R, <, >}$ ). The methodology distinguishes between two computational tasks:
- Linear Systems of Equations (LSE): Solving for Green's functions at specific energy points, which are independent and suitable for parallelization.
- Energy Convolutions (EC): Computing scattering terms which require integrating over many energy points.
  To optimize memory and communication, the authors implement a data transposition scheme. For LSE tasks, data is stored by energy point ( $E, \omega$ ) across spatial indices ( $i, j$ ), whereas for EC tasks (using Fast Fourier Transforms), the data is transposed to access multiple energy points for specific spatial indices.
Parallelization (Serinv): To overcome the sequential bottleneck of the standard RGF algorithm, the authors utilize the Serinv library. This GPU-based open-source package employs the Schur complement method on block tri-diagonal (BT) matrices. The system is partitioned across multiple computing units (CPUs/GPUs), allowing for the parallel solution of reduced systems of equations before reconstructing local Green's function entries.
Machine Learning Integration: To bypass the expensive $O(N^3)$ DFT calculation for Hamiltonian generation, the authors explore Equivariant Graph Neural Networks (EGNNs). The proposed workflow involves training an EGNN on DFT data from a single device configuration. Once trained, the network predicts Hamiltonian matrix entries for new configurations (e.g., different oxygen vacancy distributions in memory cells) with $O(N)$ scaling, enabling the simulation of thousands of atoms.

Key Contributions

QuaTrEx Package: The implementation of these models and algorithms into a novel, open-source package named QuaTrEx.
Scalable Parallel Solver: A demonstration of weak scaling on the Frontier supercomputer, achieving 80% parallel efficiency when scaling from 1 to 9,400 nodes (75,200 GPUs) for a silicon nano-ribbon simulation.
Inclusion of Scattering: Successful simulation of a silicon nano-ribbon (25,344 atoms) including self-consistent electron-electron interactions (GW approximation), moving beyond the ballistic limit.
ML-Driven Hamiltonian Prediction: Development of an EGNN capable of predicting Hamiltonian matrices for devices with thousands of atoms, trained on a single configuration to handle varying physical states.

Results

Silicon Nano-Ribbon Simulation: The authors simulated a 52.1 nm long silicon nano-ribbon with 25,344 atoms. The results confirmed that including electron-electron interactions (via self-consistent GW) slightly increases the band gap, consistent with theoretical expectations. The study also validated current conservation across the device despite variations in spectral current distribution along the transport direction.
Performance: The parallel implementation successfully handled the full scale of the Frontier supercomputer, demonstrating the feasibility of simulating large-scale devices with carrier-carrier scattering.
Machine Learning Accuracy: For the Valence Change Memory (VCM) cell application, the EGNN achieved an average error of approximately 2 meV in Hamiltonian entries compared to DFT. However, the authors note that while this error is comparable to state-of-the-art molecular predictions, it is currently not accurate enough to fully reproduce the transmission function behavior of the device.

Significance and Claims
The paper claims that the combination of the Serinv parallelization strategy and dedicated numerical algorithms unlocks the ability to explore device sizes and functionalities closer to experimental reality, including relevant physical effects like carrier-carrier scattering that were previously too computationally expensive.

Regarding machine learning, the authors present it as a promising avenue to partially eliminate the need for repeated DFT calculations, particularly for devices with time-evolving atomic geometries. However, they remain modest about the current state of this technology, acknowledging that while the scaling benefits are significant ( $O(N)$ vs $O(N^3)$ ), the predictive accuracy for complex transport properties (like transmission functions) is not yet sufficient for full replacement of first-principles methods. The work serves as a foundational step toward integrating ML into ab-initio QT workflows.