Towards Real-time Control of a CartPole System on a… — Plain-Language Explanation

Original authors: Nguyen Truong Thu Ngo, Väinö Mehtola, Jérome Lenssen, Peiyong Wang, Francesco Cosco, Tien-Fu Lu, James Q. Quach

Published 2026-05-05

📖 5 min read🧠 Deep dive

View on arXiv ↗PDF ↗

CC BY 4.0

Original authors: Nguyen Truong Thu Ngo, Väinö Mehtola, Jérome Lenssen, Peiyong Wang, Francesco Cosco, Tien-Fu Lu, James Q. Quach

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to teach a robot to balance a broomstick on its hand. This is a classic challenge in robotics called "CartPole." Usually, we teach robots using classical computers (the kind in your laptop). But what if we tried to teach it using a quantum computer?

This paper is a report card on that experiment. The researchers asked three big questions:

Can a tiny quantum computer learn to balance the broomstick faster than a normal computer?
Does the robot get confused if we train it at one speed but ask it to work at a different speed?
Can we make the quantum computer fast enough to actually control the robot in real-time, or is it too slow?

Here is the breakdown of their findings, using simple analogies.

1. The "Tiny Brain" vs. The "Big Brain"

The Setup:
The researchers built a "hybrid" robot brain. It's mostly a normal computer, but it has one tiny quantum part (a single "qubit," which is like a quantum coin that can be heads, tails, or both at once). They compared this to a "big brain" made entirely of standard computer parts (a deep neural network).

The Result:
The tiny quantum brain was a speed demon.

The Analogy: Imagine two students taking a test. The "Big Brain" student needs to read the textbook 430 times before they get an A. The "Tiny Quantum Brain" student only needs to read it 160 times to get the same A.
The Catch: This speed boost happened even when the quantum brain had to guess its answers by flipping the coin many times (a method called "parameter-shift") rather than knowing the answer perfectly. It proved that even a very small quantum model can be surprisingly efficient at learning.

2. The "Speed Bump" Problem (Training vs. Driving)

The Setup:
In the real world, a robot needs to make decisions very quickly (like 50 times a second). However, quantum computers are noisy and slow. To get a clear answer from the quantum coin, you often have to flip it many times (called "shots").

The Trade-off: If you flip the coin too few times, the answer is noisy (like trying to hear a whisper in a storm). If you flip it too many times, it takes too long, and the robot falls over before it can react.

The Experiment:
The researchers trained the robot at different speeds and then tested it at different speeds to see if it would get confused. They created a giant "heat map" (like a weather map) showing how well the robot balanced under different conditions.

The Result:

The "Inference" Speed Matters Most: It didn't matter how fast the robot was trained. What mattered was how fast it was driving (inference). If the robot was allowed to make decisions quickly (high frequency), it balanced well. If it was forced to drive slowly, it fell over.
More Flips = More Stability: If the robot had to drive slowly, they could fix it by giving it more "shots" (flipping the coin more times to get a clear answer).
The Sweet Spot: You have to find a balance. You need the robot to drive fast and have enough time to get a clear quantum answer. The paper provides a map to help engineers find this perfect balance for future robots.

3. The "Traffic Jam" vs. The "Highway" (Latency)

The Setup:
This is the most critical part. Even if the quantum computer learns well, it's useless if it's too slow to react in real-time.

The Problem: Normally, when you use a quantum computer in the cloud, you have to send your request through a lot of "bureaucracy" (software layers, compilers, internet delays). It's like trying to drive a race car through a city with stop signs, traffic lights, and construction zones.
The Old Way: Using the standard software, the robot could only make a decision about 0.14 times per second. It was essentially asleep.

The Breakthrough:
The researchers decided to bypass the "bureaucracy." They programmed the quantum computer's hardware directly, like a race car driver taking a shortcut through a private highway.

The Result: By cutting out the middlemen, they sped the robot up by 40 times. The robot could now make decisions 6.2 times per second.
The Limit: While 6.2 times a second is a huge improvement, it's still not fast enough for a broomstick that needs to be balanced 50 times a second. However, it proves that the "traffic jam" was the main problem, not the quantum physics itself.

The Bottom Line

This paper is a "proof of concept" that says:

Yes, a tiny quantum brain can learn a balancing task faster than a big classical brain.
Yes, we can map out exactly how fast and how precise the quantum computer needs to be to keep the robot from falling.
Yes, we can make quantum computers fast enough to be useful for control, but only if we stop using the slow, standard software and talk directly to the hardware.

The researchers didn't build a self-driving car or a medical robot yet. They just proved that the engine (the quantum learning) works, and they figured out how to remove the traffic jams (latency) so it can eventually drive faster.

Technical Summary: Towards Real-time Control of a CartPole System on a Quantum Computer

Problem Statement
The application of Quantum Reinforcement Learning (QRL) to real-time control systems faces significant hurdles regarding hardware latency, noise susceptibility, and learning convergence. While theoretical QML research suggests potential advantages in sample efficiency and high-dimensional representation, practical deployment on Noisy Intermediate-Scale Quantum (NISQ) devices remains limited. Existing studies often rely on idealized simulations or fail to address the critical latency bottlenecks of standard cloud-based quantum execution, which render them unsuitable for latency-sensitive, closed-loop control tasks. The specific challenge addressed in this work is the gap between simulation-only evaluations and the execution of a hybrid quantum-classical agent on a physical superconducting Quantum Processing Unit (QPU) under real-time constraints.

Methodology
The authors present an end-to-end investigation of a minimal hybrid quantum-classical agent applied to the CartPole benchmark.

Environment & State Encoding: The task involves stabilizing an inverted pendulum on a cart. The agent utilizes a reduced three-dimensional feature vector (cart velocity, pole angle, pole angular velocity) rather than the full four-dimensional state, motivated by the constraints of a single-qubit architecture.
Agent Architecture:
- Hybrid Model: The agent employs a single-qubit variational quantum circuit (VQC) connected to classical fully connected layers. The VQC uses a Hadamard gate followed by a three-rotation sequence ( $R_z-R_y-R_z$ ) to encode the state onto the Bloch sphere, and a trainable $R_x$ rotation. The expectation value of a Pauli-Z measurement is fed into classical actor and critic networks (each with 32 hidden neurons).
- Classical Baseline: A fully classical actor-critic network with identical hidden layer structures (128 and 256 units) serves as the baseline.
- Training: Both models use the Actor-Critic method with Policy Gradients. The hybrid agent is trained using the parameter-shift rule for gradient estimation on shot-based backends, as well as analytical gradients for comparison.
Experimental Categories:
1. Noiseless Benchmark: Comparison of convergence rates between classical and hybrid agents using Qiskit BasicSimulator.
2. Training-Inference Compatibility: A systematic study mapping the trade-off between control-loop rate (inference frequency) and measurement shot budget. Agents trained at various frequencies (20–100 Hz) were evaluated across different inference frequencies and shot counts (128–1024) on a noise-emulating backend (FakeAdonis).
3. Low-Latency Hardware Execution: Deployment of a trained policy on the VTT Q5 (a 5-qubit superconducting QPU). Crucially, the authors bypassed the standard high-level Qiskit/IQM software stack. Instead, they programmed the Zurich Instruments readout electronics (HDAWG and UHFQA) directly via command tables (CT), eliminating the overhead of code re-compilation and waveform upload for every parameter change.

Key Contributions

Sample Efficiency of Minimal Hybrid Agents: The study demonstrates that a single-qubit hybrid agent can solve the CartPole environment in substantially fewer episodes (approx. 162 episodes) than a comparable classical actor-critic network (approx. 429 episodes), even when trained using the parameter-shift rule with finite-shot evaluations.
Inference-Time Trade-off Analysis: The authors provide performance matrices quantifying the relationship between inference control frequency and shot count. Results indicate that higher inference frequencies consistently improve balancing stability. Furthermore, increasing the shot budget lowers the minimum inference frequency required to achieve near-maximal balancing, highlighting the need to find an optimal medium between these two constraints.
Latency Reduction via Low-Level Control: By bypassing the standard software stack and utilizing direct command table programming on the control electronics, the authors achieved an order-of-magnitude improvement in execution speed. On the VTT Q5 processor, the iteration rate increased from ~0.14 Hz (standard stack) to over 6.2 Hz (low-level path) for 128 shots, representing a speedup of over 40x.

Results

Learning Dynamics: In noiseless simulations, the hybrid agent converged significantly faster than the classical baseline. The use of parameter-shift gradients resulted in slightly slower convergence than analytical gradients but maintained a clear advantage over the classical model.
Deployment Constraints: The compatibility study revealed that inference-time constraints (frequency and shot count) are the primary determinants of stability, rather than the training frequency. A mismatch between training and inference frequencies had a secondary effect compared to the shot count and inference frequency.
Hardware Performance: On the VTT Q5, the low-level execution path enabled iteration rates of 6.23 Hz (128 shots) down to 2.71 Hz (1024 shots). While the absolute episode scores on hardware were conservative due to the lack of readout error mitigation and non-ideal inference conditions, the system successfully demonstrated closed-loop control. The results showed that with a sufficient shot budget (e.g., 1024 shots), the system could achieve near-perfect balancing scores (500) despite the hardware noise.

Significance and Claims
The paper claims to provide a foundational step toward achieving real-time closed-loop control feedback on quantum hardware. It does not claim a theoretical quantum speedup in the complexity-theoretic sense, given the low-dimensional nature of CartPole. Instead, the significance lies in:

Quantifying Boundaries: The work quantifies current boundaries of quantum-assisted control, specifically the trade-offs between shot count, control frequency, and latency.
Practical Roadmap: It outlines a practical pathway for real-time demonstrations by demonstrating that bypassing standard software stacks is necessary to reach the tens-of-hertz throughput required for real-time feedback.
Feasibility of Minimal Models: It validates that minimal single-qubit models can act as effective learning agents in RL loops when paired with appropriate encoding and lightweight classical post-processing, even under realistic noise and finite-shot constraints.

The authors conclude that while current NISQ hardware iteration rates (multi-hertz) have not yet fully reached the tens-of-hertz regime required for robust real-time control, the demonstrated low-latency pipeline provides a viable start for achieving such throughput in future iterations.

Towards Real-time Control of a CartPole System on a Quantum Computer

1. The "Tiny Brain" vs. The "Big Brain"

2. The "Speed Bump" Problem (Training vs. Driving)

3. The "Traffic Jam" vs. The "Highway" (Latency)

The Bottom Line

Technical Summary: Towards Real-time Control of a CartPole System on a Quantum Computer

More like this