Original authors: Mohammad Abrarul Hasanat, Jason Ludmir, Tirthak Patel, Rohan Basu Roy

Published 2026-05-13

📖 5 min read🧠 Deep dive

Original authors: Mohammad Abrarul Hasanat, Jason Ludmir, Tirthak Patel, Rohan Basu Roy

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to send a delicate, complex message across a very noisy, bumpy road. The message is a quantum program (a set of instructions for a quantum computer), and the road is the quantum hardware.

The problem is that the road is full of potholes (errors) and the message degrades the longer it takes to get there. If you take a long, winding route, your message might arrive garbled. If you take a fast route but hit too many potholes, it also arrives garbled.

Currently, the "drivers" (the compilers) that send these messages use a fixed rulebook. They tell every single message to take the exact same route, regardless of whether the message is simple or complex, or whether the road is currently dry or muddy. Sometimes this works, but often it's inefficient, leading to slow delivery or a broken message.

TuniQ is a new, smart driver that changes the rules. Instead of following a fixed map, it uses Reinforcement Learning (a type of AI that learns by trial and error) to decide the best route for every single message in real-time.

Here is how TuniQ works, broken down into simple concepts:

1. The "Fixed Rulebook" vs. The "Smart Driver"

Think of the current system (IBM Qiskit) as a GPS that forces every car to take the same highway, even if a shortcut exists for a specific car. It applies the same set of "optimization passes" (traffic rules) to every quantum circuit.

The Flaw: A shortcut that saves time for a small car might cause a traffic jam for a large truck. Similarly, a compiler setting that helps one quantum program might actually hurt another.
The TuniQ Solution: TuniQ is like a driver who looks at the specific cargo (the circuit), checks the current weather and road conditions (the hardware's noise levels), and then decides: "Do I need to take the scenic route to avoid a pothole? Or should I speed up because the road is clear?" It chooses which "traffic rules" to apply and which to skip for that specific trip.

2. The "Dual-Encoder" (The Driver's Two Sets of Eyes)

To make these decisions, TuniQ needs to see the world differently at different stages of the trip. The paper describes a Dual-Encoder system:

Before the Road (Logical View): At the start, the driver looks at the plan of the trip. It sees the logical connections between the passengers (qubits) without worrying about the specific potholes yet. It asks, "How do these people need to sit together?"
After the Road (Physical View): Once the car is on the road, the driver switches to a different set of eyes. Now, it looks at the actual car and the actual road conditions. It sees which specific tires (physical qubits) are wearing out and which parts of the road are bumpy.
Why it matters: This allows TuniQ to adapt. If the road gets muddier (noise increases), it can instantly switch strategies to a safer, slower route without needing to be retrained.

3. The "Shaped Rewards" (Learning from the Journey)

In the old way, the driver only got feedback at the very end: "Did you deliver the message?" If the message was broken, the driver didn't know which turn caused the problem.

TuniQ's Approach: TuniQ gets small "points" (rewards) along the way.
- "Good job avoiding that pothole!" (Intermediate reward).
- "Nice job keeping the car steady!" (Another intermediate reward).
- "You delivered the message perfectly!" (Final reward).
  This helps the driver learn that a specific turn early in the trip was crucial for the success of the whole journey, even if the result wasn't visible until the end.

4. The "Dynamic Mask" (The Safety Guard)

You can't just let a driver pick any road; some roads are dead ends or illegal.

TuniQ uses Dynamic Action Masking. Think of this as a guardrail that instantly blocks the driver from trying to take a turn that would break the car or violate traffic laws. It ensures that no matter what the AI decides, the final result is always a valid, drivable path.

The Results: Faster and Clearer

The paper tested TuniQ on real quantum computers from IBM. Here is what happened:

Better Quality: The messages arrived much clearer. On average, the "fidelity" (how much the message matched the original plan) improved by 20%.
Faster Delivery: The time it took to plan the route (compilation time) dropped by 34%. This is huge because many quantum algorithms have to plan their route thousands of times in a row.
No Retraining Needed: If you move the driver to a different city (a different quantum computer), TuniQ works immediately without needing to learn the new city from scratch.
Scaling Up: As the messages get bigger and more complex (utility-scale circuits), TuniQ gets even better compared to the old fixed rulebooks.

Summary

TuniQ is like upgrading from a rigid, one-size-fits-all GPS to a smart, adaptive co-pilot. It looks at the specific cargo, checks the real-time road conditions, and learns from every trip to choose the perfect mix of speed and safety. This makes quantum computing more reliable and faster, especially as we try to solve bigger problems in the future.

Technical Summary: TuniQ

Problem Statement

Quantum processors are increasingly integrated into High-Performance Computing (HPC) ecosystems as co-processors, where quantum circuits function as kernels dispatched from classical nodes. However, current quantum compilers, such as IBM's Qiskit transpiler, rely on a fixed sequence of compilation passes applied uniformly to all circuits. This "one-size-fits-all" approach fails to account for three critical variables:

Circuit Structure: Different algorithms (e.g., QPE, VQE, Grover) have distinct topologies and gate compositions that benefit from different optimization strategies.
Hardware Backends: Quantum devices vary in coupling topologies, native gate sets, and error profiles.
Noise Conditions: Calibration data (gate errors, coherence times $T_1/T_2$ ) drifts over time on a single device.

A fixed pass sequence often applies unnecessary optimizations that increase circuit depth or gate count, thereby accumulating more noise and reducing output fidelity (measured by Total Variation Distance, TVD). Conversely, it may skip beneficial passes for specific circuit structures. Furthermore, exhaustive search over the millions of possible pass combinations is computationally intractable, and greedy per-stage optimization often leads to globally suboptimal results because early decisions constrain later stages.

Methodology: TuniQ

TuniQ is a Reinforcement Learning (RL)-based system designed to adaptively select compilation passes at each stage of the transpilation pipeline. It formulates pass selection as a Markov Decision Process (MDP) where an agent learns to maximize circuit fidelity while minimizing compilation time.

Core Components

Dual-Encoder Architecture:
- Pre-Layout Encoder: Encodes the logical circuit structure (spatio-temporal gate interactions) before hardware mapping.
- Post-Layout Encoder: Encodes the circuit bound to physical hardware, incorporating real-time noise characteristics (error rates, coherence times) from the backend calibration.
- This separation allows the agent to learn stage-specific strategies: layout/routing decisions based on logical structure, and optimization decisions based on physical noise profiles.
State Space:
- Includes a one-hot stage indicator (Init, Layout, Routing, Translate, Optimize, Cleanup).
- Circuit features represented as tensors (logical qubits pre-layout, physical qubits post-layout).
- Global features including gate counts, depth, and topology compatibility ratios.
Action Space & Dynamic Masking:
- The agent selects specific transpiler passes or a "skip" action at each stage.
- Dynamic Action Masking enforces valid compilation sequences. It prevents invalid transitions (e.g., skipping routing before layout) and ensures hardware constraints are met, guaranteeing that every completed episode produces an executable circuit.
Reward Structure:
- Shaped Rewards: To address the credit assignment problem across multiple stages, TuniQ uses intermediate rewards based on a Transpilation Quality (TQ) metric. This metric estimates success probability (ESP) using gate error rates and circuit depth, adapting as the circuit moves from logical to physical representation.
- Final Reward: Upon completion, the agent receives a reward based on the log-ratio of the achieved ESP against a Qiskit Level 3 (Fidelity Optimized) baseline, combined with auxiliary terms for gate count and depth reduction.
Training & Inference:
- Training: Uses Maskable PPO (Proximal Policy Optimization) on random circuits and perturbed backend noise profiles to ensure robustness.
- Inference: The policy is frozen. The system performs a single forward pass to select passes, adding negligible overhead (<1% of total compilation time). No reference compilation or reward calculation is performed during inference.

Key Contributions

First Noise-Conditioned Cross-Stage Selector: TuniQ is the first system to formulate transpilation as a unified cross-stage pass selection problem conditioned on real-time noise profiles, jointly optimizing for fidelity and compilation time.
Novel RL Extensions: The paper introduces a dual-encoder for stage-aware representation, shaped rewards for cross-stage credit assignment, and dynamic action masking to guarantee valid compilation.
Scalability and Generalization: The system is trained on small circuit instances (5–10 qubits) but scales effectively to utility-scale circuits (up to 65 qubits) without retraining. It generalizes across different IBM Quantum backends (Torino, Fez, Kingston, Pittsburgh) in a zero-shot manner.
Open Source: The framework and implementation are open-sourced to facilitate community adoption.

Experimental Results

Evaluated on diverse workloads (MQTBench, QASMBench) across multiple IBM Quantum Cloud processors:

Fidelity Improvement: TuniQ improves output fidelity (reduces TVD) by an average of 20% compared to the state-of-the-art Qiskit (Fidelity Optimized) transpiler. For specific benchmarks like QPE, TVD was reduced from 0.76 to 0.50, significantly improving algorithmic success.
Compilation Time: TuniQ reduces compilation time by an average of 34%. This is critical for variational algorithms (e.g., VQE, QAOA) that recompile circuits thousands of times.
Scaling: As circuit size increases (up to 65 qubits), TuniQ's advantage grows, producing circuits with 40% fewer gates and 50% lower depth than the baseline.
Robustness: The system maintains effectiveness across varying noise levels (simulated by scaling error rates) and different hardware generations (Heron R1–R3), demonstrating resilience to calibration drift.

Significance and Claims

The paper claims that TuniQ addresses a fundamental limitation in current quantum compilation: the reliance on static, fixed pass sequences. By shifting to an adaptive, learned approach, TuniQ demonstrates that optimal pass selection is highly context-dependent on the circuit, hardware, and noise environment.

The authors emphasize that TuniQ does not merely improve a single metric but provides a better quality-time tradeoff. Unlike search-based methods (e.g., evolutionary algorithms) which incur high per-circuit overhead, TuniQ amortizes the search cost during training, making it suitable for HPC workflows where throughput is essential. The work suggests that as quantum hardware evolves toward fault tolerance, adaptive compilation will remain a key performance lever, and TuniQ provides a scalable framework for realizing this potential.

TuniQ: Autotuning Compilation Passes for Quantum Workloads at Scale for Effectiveness and Efficiency