Reinforcement Learning Control of Quantum Error Correction

Volodymyr Sivak, Alexis Morvan, Michael Broughton, Rodrigo G. Cortiñas, Johannes Bausch, Andrew W. Senior, Matthew Neeley, Alec Eickbusch, Noah Shutty, Laleh Aghababaie Beni, James S. Spencer, Francisco J. H Heras, Thomas Edlich, Dmitry Abanin, Amira Abbas, Rajeev Acharya, Georg Aigeldinger, Ross Alcaraz, Sayra Alcaraz, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Walt Askew, Nikita Astrakhantsev, Juan Atalaya, Brian Ballard, Joseph C. Bardin, Hector Bates, Andreas Bengtsson, Majid Bigdeli Karimi, Alexander Bilmes, Simon Bilodeau, Felix Borjans, Alexandre Bourassa, Jenna Bovaird, Dylan Bowers, Leon Brill, Peter Brooks, David A. Browne, Brett Buchea, Bob B. Buckley, Tim Burger, Brian Burkett, Nicholas Bushnell, Jamal Busnaina, Anthony Cabrera, Juan Campero, Hung-Shen Chang, Silas Chen, Ben Chiaro, Liang-Ying Chih, Agnetta Y. Cleland, Bryan Cochrane, Matt Cockrell, Josh Cogan, Roberto Collins, Paul Conner, Harold Cook, William Courtney, Alexander L. Crook, Ben Curtin, Martin Damyanov, Sayan Das, Dripto M. Debroy, Sean Demura, Paul Donohoe, Ilya Drozdov, Andrew Dunsworth, Valerie Ehimhen, Aviv Moshe Elbag, Lior Ella, Mahmoud Elzouka, David Enriquez, Catherine Erickson, Vinicius S. Ferreira, Marcos Flores, Leslie Flores Burgos, Ebrahim Forati, Jeremiah Ford, Austin G. Fowler, Brooks Foxen, Masaya Fukami, Alan Wing Lun Fung, Lenny Fuste, Suhas Ganjam, Gonzalo Garcia, Christopher Garrick, Robert Gasca, Helge Gehring, Robert Geiger, Élie Genois, William Giang, Dar Gilboa, James E. Goeders, Edward C. Gonzales, Raja Gosula, Stijn J. de Graaf, Alejandro Grajales Dau, Dietrich Graumann, Joel Grebel, Alex Greene, Jonathan A. Gross, Jose Guerrero, Loïck Le Guevel, Tan Ha, Steve Habegger, Tanner Hadick, Ali Hadjikhani, Michael C. Hamilton, Matthew P. Harrigan, Sean D. Harrington, Jeanne Hartshorn, Stephen Heslin, Paula Heu, Oscar Higgott, Reno Hiltermann, Hsin-Yuan Huang, Mike Hucka, Christopher Hudspeth, Ashley Huff, William J. Huggins, Evan Jeffrey, Shaun Jevons, Zhang Jiang, Xiaoxuan Jin, Chaitali Joshi, Pavol Juhas, Andreas Kabel, Dvir Kafri, Hui Kang, Kiseo Kang, Amir H. Karamlou, Ryan Kaufman, Kostyantyn Kechedzhi, Tanuj Khattar, Mostafa Khezri, Seon Kim, Can M. Knaut, Bryce Kobrin, Fedor Kostritsa, John Mark Kreikebaum, Ryuho Kudo, Ben Kueffler, Arun Kumar, Vladislav D. Kurilovich, Vitali Kutsko, Nathan Lacroix, David Landhuis, Tiano Lange-Dei, Brandon W. Langley, Pavel Laptev, Kim-Ming Lau, Justin Ledford, Joy Lee, Kenny Lee, Brian J. Lester, Wendy Leung, Lily Li, Wing Yan Li, Ming Li, Alexander T. Lill, William P. Livingston, Matthew T. Lloyd, Aditya Locharla, Laura De Lorenzo, Daniel Lundahl, Aaron Lunt, Sid Madhuk, Aniket Maiti, Ashley Maloney, Salvatore Mandrà, Leigh S. Martin, Orion Martin, Eric Mascot, Paul Masih Das, Dmitri Maslov, Melvin Mathews, Cameron Maxfield, Jarrod R. McClean, Matt McEwen, Seneca Meeks, Kevin C. Miao, Zlatko K. Minev, Reza Molavi, Sebastian Molina, Shirin Montazeri, Charles Neill, Michael Newman, Anthony Nguyen, Murray Nguyen, Chia-Hung Ni, Murphy Yuezhen Niu, Logan Oas, Raymond Orosco, Kristoffer Ottosson, Alice Pagano, Agustin Di Paolo, Sherman Peek, David Peterson, Alex Pizzuto, Elias Portoles, Rebecca Potter, Orion Pritchard, Michael Qian, Chris Quintana, Arpit Ranadive, Matthew J. Reagor, Rachel Resnick, David M. Rhodes, Daniel Riley, Gabrielle Roberts, Roberto Rodriguez, Emma Ropes, Lucia B. De Rose, Eliott Rosenberg, Emma Rosenfeld, Dario Rosenstock, Elizabeth Rossi, Pedram Roushan, David A. Rower, Robert Salazar, Kannan Sankaragomathi, Murat Can Sarihan, Kevin J. Satzinger, Max Schaefer, Sebastian Schroeder, Henry F. Schurkus, Aria Shahingohar, Michael J. Shearn, Aaron Shorter, Vladimir Shvarts, Spencer Small, W. Clarke Smith, David A. Sobel, Barrett Spells, Sofia Springer, George Sterling, Jordan Suchard, Aaron Szasz, Alexander Sztein, Madeline Taylor, Jothi Priyanka Thiruraman, Douglas Thor, Dogan Timucin, Eifu Tomita, Alfredo Torres, M. Mert Torunbalci, Hao Tran, Abeer Vaishnav, Justin Vargas, Sergey Vdovichev, Guifre Vidal, Catherine Vollgraff Heidweiller, Meghan Voorhees, Steven Waltman, Jonathan Waltz, Shannon X. Wang, Brayden Ware, James D. Watson, Yonghua Wei, Travis Weidel, Theodore White, Kristi Wong, Bryan W. K. Woo, Christopher J. Wood, Maddy Woodson, Cheng Xing, Z. Jamie Yao, Ping Yeh, Bicheng Ying, Juhwan Yoo, Noureldin Yosri, Elliot Young, Grayson Young, Adam Zalcman, Ran Zhang, Yaxing Zhang, Ningfeng Zhu, Nicholas Zobrist, Zhenjie Zou, Ryan Babbush, Dave Bacon, Sergio Boixo, Yu Chen, Zijun Chen, Michel Devoret, Monica Hansen, Jeremy Hilton, Cody Jones, Julian Kelly, Alexander N. Korotkov, Erik Lucero, Anthony Megrant, Hartmut Neven, William D. Oliver, Ganesh Ramachandran, Vadim Smelyanskiy, Paul V. Klimov

Published Tue, 10 Ma

📖 4 min read🧠 Deep dive

View on arXiv ↗PDF ↗

Imagine you are trying to keep a house of cards standing in the middle of a windy room. The cards represent your quantum computer's data, and the wind represents the constant, tiny jitters and errors caused by the environment (temperature changes, electrical noise, etc.).

In the past, if the cards started to wobble, the only way to fix them was to stop everything. You would freeze the room, carefully re-stack the cards, check every single one, and then start again. But for the complex calculations of the future (which might take days or weeks), stopping every time the wind blows is impossible. You'd never finish the job.

This paper from Google Quantum AI and Google DeepMind introduces a revolutionary new way to handle this: Teaching the quantum computer to "surf" the wind instead of fighting it.

Here is the breakdown of how they did it, using simple analogies:

1. The Problem: The "Drifting" Tuning Fork

Quantum computers are incredibly sensitive analog machines. Think of them like a giant, ultra-precise orchestra. To play a perfect song (a calculation), every instrument (qubit) must be perfectly tuned.

The Old Way: Every hour, the conductor stops the music, checks every instrument, re-tunes them, and then starts over. This is slow and wasteful.
The New Problem: The instruments don't just stay out of tune; they drift. The temperature changes, and the tuning shifts while you are playing. Stopping to fix it breaks the flow.

2. The Solution: The "Self-Correcting" Conductor

The researchers created an Artificial Intelligence (AI) agent using a technique called Reinforcement Learning (RL).

The Metaphor: Imagine a conductor who doesn't just listen to the music, but also feels the wind in the room. Instead of stopping the orchestra, the conductor makes tiny, invisible adjustments to the instruments while they are playing.
How it learns: The AI doesn't need to know the physics of the wind. It just watches the "mistakes." In quantum computing, when an error happens, it leaves a tiny "footprint" (called a detection event). The AI treats these footprints as a score.
- Fewer footprints? Good job! (Reward).
- More footprints? Try adjusting the knobs differently. (Penalty).

3. The Magic Trick: Turning Errors into a Map

Usually, errors are bad. But this system turns errors into a GPS map.

The AI looks at where the errors are happening and asks, "Which control knob caused this?"
It then nudges that knob slightly in the opposite direction to fix it.
Because the AI is constantly doing this, it creates a feedback loop: The computer learns from its own mistakes in real-time, without ever stopping.

4. The Results: A 3.5x Boost

They tested this on a superconducting processor called "Willow."

The Experiment: They intentionally made the system drift (like turning up the wind) to see if the AI could handle it.
The Outcome: The AI-controlled system was 3.5 times more stable than the old method. It kept the house of cards standing even when the wind got stronger.
Record Breaking: They achieved the lowest error rates ever recorded for these types of quantum codes, proving that an AI can tune a quantum computer better than human experts can.

5. Why This Matters: The "Never-Stop" Computer

The most exciting part is scalability.

The Old Fear: As quantum computers get bigger (with thousands of qubits), the number of knobs to turn becomes millions. Humans can't tune millions of knobs, and stopping to tune them would take forever.
The New Hope: The AI doesn't care how big the computer gets. It learns the pattern of the errors and adjusts the knobs automatically. The paper shows simulations that this method works just as well for a massive computer as it does for a small one.

The Bottom Line

This paper is a major step toward Fault-Tolerant Quantum Computing. It moves us from a world where we have to pause and fix our computers constantly, to a world where the computer is self-healing.

It's like upgrading from a car that needs a mechanic to stop and adjust the engine every 10 miles, to a car with a self-driving AI that constantly adjusts the suspension, fuel, and steering while you drive at 100 mph, ensuring you never crash, no matter how bumpy the road gets.

In short: They taught the quantum computer to learn from its own mistakes and fix itself on the fly, paving the way for machines that can run complex calculations for days without ever stopping.

Here is a detailed technical summary of the paper "Reinforcement learning control of quantum error correction" by Google Quantum AI and Google DeepMind.

1. The Problem: Environmental Drift and Calibration Bottlenecks

Quantum computers are inherently analog and fragile, making them susceptible to environmental noise and parameter drift (e.g., frequency shifts, amplitude fluctuations). While Quantum Error Correction (QEC) is designed to digitize these errors into discrete syndromes, its effectiveness relies on the physical gate error rates remaining significantly below a specific threshold (typically $10^{-3} - 10^{-2}$).

The Challenge: Current QEC systems require periodic "halting" of computation to recalibrate control parameters. This stop-and-go approach is unsustainable for future algorithms requiring continuous runtimes of days or weeks.
The Limitation: Traditional calibration relies on human experts and pre-defined, orthogonalized experiments (e.g., spectroscopy, Rabi oscillations). These methods struggle to maintain performance against non-stationary drift and cannot easily scale to the thousands of control parameters required for large-scale logical qubits.
The Gap: There is a need for a system that can continuously steer control parameters during computation, utilizing the error detection signals already generated by the QEC protocol, without interrupting the logical algorithm.

2. Methodology: Reinforcement Learning (RL) Framework

The authors propose a paradigm where the QEC process serves a dual role: correcting logical states and providing a learning signal for an RL agent.

A. Core Concept: Learning from Errors

Instead of stopping to calibrate, the system treats error detection events (syndromes) as a reward signal.

Surrogate Objective ( $C$ ): Directly optimizing the Logical Error Rate (LER) is intractable for RL due to the exponential number of cycles required to resolve it and the sparsity of logical errors. Instead, the authors define a surrogate objective $C$ $C$ , the average rate of error detection events.
- Based on surface code scaling models ( $\epsilon_L \propto \Lambda^{-d/2}$ ), minimizing detection events correlates with minimizing logical errors.
- The gradient of the log-LER is related to the gradient of the log-surrogate objective by a factor of $(d+1)/2$ .
Sparse Factor Graph: The relationship between control parameters and detection events is local. The authors map detectors to the specific control parameters of the gates within their "detecting region." This creates a sparse factor graph, allowing the RL algorithm to efficiently estimate gradients without needing global information.

B. The RL Algorithm

Policy: The agent maintains a parameterized probability distribution (a factorized multivariate Gaussian) over all control parameters.
- Mean ( $\mu$ ): Represents the current best guess for optimal control.
- Variance ( $\sigma^2$ ): Controls exploration.
Training Loop:
1. Sampling: A batch of policy candidates is sampled from the distribution.
2. Execution: Each candidate is applied to the quantum processor for a short duration (e.g., 25 QEC cycles).
3. Reward: The error detection rate is measured. Candidates with lower detection rates receive higher rewards.
4. Update: The policy distribution is updated using a multi-objective policy-gradient algorithm (incorporating Proximal Policy Optimization and entropy regularization) to shift the mean toward better parameters.
Steering vs. Fine-tuning:
- Fine-tuning: Starting from a well-calibrated state to push performance beyond human limits.
- Steering: Continuously adapting $\mu(t)$ in real-time to track slow environmental drift, effectively acting as a low-pass filter for system instability.

3. Key Contributions

Unified Calibration and Computation: Demonstrated a framework where QEC error syndromes are repurposed as a learning signal, eliminating the need to halt computation for calibration.
Scalable RL Control: Developed an algorithm that handles >1,000 control parameters (and scales to ~40,000 in simulation) by exploiting the sparsity of the detector-parameter mapping.
Record-Breaking Performance: Achieved the lowest logical error rates reported to date for surface and color codes on superconducting processors.
Drift Resilience: Proved that RL can actively steer the system to counteract injected and natural drift, stabilizing performance over time.

4. Experimental Results

The framework was tested on Google's Willow superconducting processors using distance-5 and distance-7 surface codes, and a distance-5 color code.

Performance Improvement (Fine-tuning):
- After exhaustive traditional calibration and human tuning, RL fine-tuning provided an additional 20% suppression of the Logical Error Rate (LER).
- Record LERs:
  - Distance-7 Surface Code: $\epsilon_L = 7.72(9) \times 10^{-4}$ (using the AlphaQubit2 neural network decoder).
  - Distance-5 Color Code: $\epsilon_L = 8.19(14) \times 10^{-3}$ (using the Tesseract decoder).
Drift Steering:
- Against injected artificial drift (step, sinusoidal, and stroboscopic), RL steering improved logical stability by 2.4-fold.
- When combined with decoder parameter steering, stability improved by 3.5-fold.
- The system recovered from step-like drift in approximately 130 epochs (learning time).
Randomized Initialization: The RL agent successfully recovered high-performance states even when starting from a completely randomized control policy (where logical error probability was 50%), taking ~1,000 epochs to reach calibrated performance.
Scalability Simulations:
- Simulations up to distance-15 surface codes (~40,000 parameters) confirmed that the convergence rate of the RL algorithm is independent of system size.
- The algorithm exhibits exponential convergence ( $\Lambda \propto e^{-\gamma t}$ ) near the optimum, driven by the sparsity of the control graph.

5. Significance and Future Outlook

New Paradigm for Fault Tolerance: This work shifts the path to fault tolerance from relying solely on better hardware to intelligent control. It enables a "quantum computer that learns from its errors and never stops computing."
Hardware Agnostic: While demonstrated on superconducting circuits, the framework is general and applicable to any qubit modality (e.g., trapped ions, neutral atoms) and any QEC architecture.
Automation: The results suggest that future quantum processors could be calibrated ab initio entirely by RL, potentially removing the reliance on human experts and complex, pre-defined calibration stacks.
Real-Time Viability: The study establishes that real-time steering is feasible for drift frequencies below a critical threshold (~1/150 epochs), balancing exploration noise with exploitation to maintain logical coherence.

In conclusion, this paper demonstrates that Reinforcement Learning is a viable, scalable, and highly effective method for managing the analog control of large-scale quantum error-corrected systems, solving the critical bottleneck of environmental drift and paving the way for long-duration, fault-tolerant quantum algorithms.