Separating Ansatz Discovery from Deployment on Larger Problems: Reinforcement Learning for Modular Circuit Design

Imagine you are trying to build a massive, incredibly complex Lego castle. The problem is that the instructions for the whole castle are so huge and complicated that no single person (or computer) can figure out the best way to build it all at once. If you try to design the entire thing from scratch while looking at 16 different Lego bricks, your brain (or the computer's processor) gets overwhelmed and crashes.

This is the exact problem scientists face with Quantum Computers. As they add more "qubits" (the quantum version of Lego bricks), the math needed to design the circuits becomes impossible for classical computers to handle.

This paper proposes a clever workaround: Don't try to design the whole castle at once. Instead, design one perfect little room, and then copy-paste it.

Here is the breakdown of their idea using simple analogies:

1. The Problem: The "Big Brain" Bottleneck

In the world of quantum computing, there is a task called Quantum Architecture Search (QAS). This is like asking a computer to invent the best possible circuit design to solve a specific math problem.

The Catch: When the problem is small (like 8 qubits), a classical computer can simulate it and learn the best design.
The Wall: When the problem gets big (12, 16, or 100 qubits), the classical computer can't even simulate the system anymore to learn from it. It's like trying to solve a Sudoku puzzle while blindfolded because the board is too big.

2. The Solution: The "Modular Lego Block" Strategy

The authors, Gloria Turati and her team, suggest splitting the job into two distinct phases:

Phase 1: The Discovery (The Small Room)
They use a smart AI (called Reinforcement Learning) to figure out the perfect design for a tiny, 2-brick module. They do this on a small, manageable problem (8 qubits) where the computer can easily "see" what's happening.
- Analogy: Imagine an architect designing the perfect, most efficient kitchen layout for a tiny studio apartment. They test it, tweak it, and make it perfect.
Phase 2: The Deployment (The Whole Castle)
Once they have that perfect "kitchen module," they don't try to redesign the whole house. Instead, they take that single perfect module and copy and paste it over and over again to build the larger house (12 or 16 qubits).
- Analogy: Now, to build a massive mansion, you just repeat that perfect kitchen design in every wing of the house. You don't need to redesign the kitchen for the mansion; you just reuse the blueprint you already perfected.

3. The Magic Ingredient: The "Smart Agent"

To find that perfect little module, they used a technique called Reinforcement Learning (RL).

Think of the AI as a video game character trying to beat a level.
The "level" is building a quantum circuit.
The "score" is how well the circuit solves the math problem.
The AI tries adding different gates (like adding different Lego pieces). If the score goes up, it gets a "reward." If it goes down, it gets a "penalty."
Over time, the AI learns the best sequence of moves to build a tiny, super-efficient block.

4. What Did They Find?

They tested this on three types of puzzles (Maximum Cut, Maximum Clique, Minimum Vertex Cover) using different graph shapes.

Result A: The "Copy-Paste" works. The blocks they learned on the small 8-qubit problems worked just as well when they were copied onto the larger 12 and 16-qubit problems. The quality of the solution didn't drop.
Result B: It's actually better than free-for-all design. They compared their "Modular Block" method against a method where the AI was allowed to build any circuit it wanted (no rules). Surprisingly, the AI built better circuits when it was forced to stick to the modular block structure.
- Why? It's like telling a chef, "You can only use these 5 ingredients." Instead of getting confused by 50 options, the chef creates a masterpiece with the 5 they know best.
Result C: It saves energy. The circuits they built used fewer "expensive" quantum gates (which are prone to errors) compared to standard methods.

5. Why Does This Matter?

Currently, if you want to solve a big quantum problem, you have to wait until quantum computers are powerful enough to simulate the design process themselves, or you have to hope a human genius can guess the right structure.

This paper says: "We can use today's small, classical computers to learn the rules of the game, and then apply those rules to the big, future quantum computers."

It separates the learning (which happens on small, easy problems) from the doing (which happens on big, hard problems).

The Bottom Line

The authors didn't necessarily invent a new super-solver that beats all classical computers yet. Instead, they invented a new way of thinking about how to design quantum circuits.

They proved that you can teach a computer to be a master builder on a small scale, and then trust it to build a skyscraper by simply repeating its small, perfect blueprints. This opens the door to using quantum computers for real-world problems much sooner than we thought possible.

1. Problem Statement

The paper addresses a critical bottleneck in Quantum Architecture Search (QAS): the difficulty of scaling automated circuit design to larger quantum systems.

The Scalability Gap: Current QAS methods (using Reinforcement Learning, evolutionary algorithms, or differentiable search) typically rely on classical simulations to evaluate circuit performance. However, as the number of qubits ( $n$ ) increases, the state space grows exponentially ( $2^n$ ), making classical simulation and the observation of quantum states computationally intractable.
The Optimization Burden: QAS involves two difficult optimization stages: (1) searching for the optimal circuit structure (ansatz) and (2) optimizing the parameters for a specific problem instance. Performing both simultaneously on large systems is often too costly.
The Goal: The authors aim to develop a methodology that allows for the discovery of effective ansatz structures on small, classically simulatable systems (e.g., $n=8$ ) and then reuses these structures to solve larger, classically intractable problems (e.g., $n=12, 16$ ) without re-running the expensive architecture search.

2. Methodology

The authors propose a two-phase framework called RLVQC (Reinforcement Learning for Variational Quantum Circuits), which decouples structure discovery from deployment.

A. The Two-Phase Approach

Discovery Phase (Small Scale):
- An RL agent learns a modular two-qubit block on small instances ( $n=8$ ).
- The agent operates in a constrained environment where it sequentially adds gates to a single block.
- The learning is feasible because the system size is small enough for classical simulation.
Deployment Phase (Large Scale):
- The learned block is not re-optimized for the new problem size.
- Instead, it is composed into a full ansatz for larger instances ( $n=12, 16$ ) using an explicit rule based on the problem's interaction graph (derived from QUBO formulations).
- The block is applied to all interacting qubit pairs defined by the problem Hamiltonian.

B. The RL Framework (RLVQC)

Algorithm: The agent uses Proximal Policy Optimization (PPO) with an Actor-Critic architecture.
State/Observation: The agent observes a vector of empirical probabilities of measuring computational basis states (derived from $n_{runs}$ shots). This mimics real hardware constraints where full state vectors are unavailable.
Action Space:
- RLVQC Global: The agent places gates anywhere on any qubit pair (unconstrained).
- RLVQC Block (Proposed): The agent constructs a single two-qubit block. This block is then replicated across all interacting qubit pairs in the final circuit.
Reward Function: Designed to minimize the Hamiltonian expectation value (energy) while penalizing circuit depth ( $R_t = -\langle H \rangle^*_t - \beta d_t$ ).
Parameter Sharing Variants: To test scalability and efficiency, three block parameterization schemes are evaluated:
1. Agnostic: Independent parameters for every gate (high expressivity).
2. Weighted: Parameters scaled by the interaction coefficients ( $q_{ij}$ ) of the QUBO problem.
3. Tied: Parameters are shared across all instances of the block in the same layer (similar to standard QAOA), drastically reducing the number of trainable parameters.

C. Target Problems

The methodology is tested on Quadratic Unconstrained Binary Optimization (QUBO) problems, specifically:

Maximum Cut
Maximum Clique
Minimum Vertex Cover
These are mapped to Ising Hamiltonians and solved using Variational Quantum Algorithms (VQAs).

3. Key Contributions

Separation of Discovery and Deployment: The paper introduces a novel workflow where the complex task of finding a circuit structure is isolated to small, simulatable systems. The resulting modular structure is then compositionaly extended to larger systems, bypassing the need for QAS on large qubit counts.
RLVQC Block: A specific RL variant that learns reusable two-qubit building blocks. The authors demonstrate that restricting the search space to modular blocks does not hinder performance; in fact, it often improves it compared to unconstrained search.
Scalability Validation: The study empirically proves that blocks learned on $n=8$ qubits remain effective when deployed on $n=12$ and $n=16$ qubits, maintaining solution quality without re-learning the architecture.
Parameter Efficiency: The "Tied" parameter-sharing scheme is shown to achieve high approximation ratios with significantly fewer trainable parameters and faster convergence compared to standard QAOA and ma-QAOA (multi-angle QAOA).

4. Experimental Results

The authors conducted extensive experiments on 24 graph topologies across three problem types.

Effectiveness of Modularity (Experiment 1):
- Performance: On $n=16$ instances, RLVQC Block consistently outperformed RLVQC Global and standard QAOA in terms of Approximation Ratio (A.R.), particularly on Maximum Cut and Minimum Vertex Cover.
- Circuit Efficiency: The RL-discovered circuits used significantly fewer CX (CNOT) gates than standard QAOA. Since 2-qubit gates are the primary source of noise and error on NISQ hardware, this is a critical advantage.
- Conclusion: Constraining the search to a modular block structure is not detrimental; it guides the agent toward more robust and hardware-efficient solutions.
Extensibility to Larger Problems (Experiment 2):
- Stability: Blocks learned on $n=8$ were deployed on $n=12$ and $n=16$ . The solution quality (Approximation Ratio) remained stable and did not degrade as the problem size increased.
- Statistical Significance: Wilcoxon signed-rank tests confirmed that the improvements of RLVQC Block variants over QAOA and ma-QAOA are statistically significant across most configurations.
- Resource Trade-off: The Tied variant achieved high-quality solutions with a fraction of the parameters required by ma-QAOA. While ma-QAOA could reach slightly higher A.R. in some cases, it required saturating the optimization budget (1000 iterations), whereas Tied converged in very few iterations.

5. Significance and Implications

Bridging the Simulation Gap: This work provides a practical pathway to utilize classical machine learning for quantum circuit design in regimes where full quantum simulation is impossible. By learning on small systems and composing for large ones, it circumvents the exponential scaling of classical simulation.
Hardware Awareness: The method naturally discovers circuits with fewer 2-qubit gates, making the resulting ansatzes more suitable for current and near-term noisy quantum hardware.
Modular Design Philosophy: It validates the hypothesis that quantum algorithms for combinatorial optimization can be built from reusable, problem-agnostic (or problem-adapted) modular blocks, rather than requiring unique, monolithic circuit designs for every problem size.
Future Directions: The approach opens avenues for applying modular ansatz discovery to other tasks like state preparation and chemistry, and suggests that future QAS research should focus on composition rules and observation representations that scale favorably.

In summary, the paper demonstrates that Reinforcement Learning can effectively discover modular quantum circuit structures on small scales, which can then be deployed to solve larger, classically intractable problems with high efficiency and stability.