⚛️ quantum physics

RELiQ: Scalable Entanglement Routing via Reinforcement Learning in Quantum Networks

This paper introduces RELiQ, a reinforcement learning-based framework utilizing graph neural networks to achieve scalable and robust entanglement routing in quantum networks by relying solely on local information, thereby outperforming existing heuristics and learning-based methods across diverse topologies without requiring global network knowledge.

Original authors: Tobias Meuser, Jannis Weil, Aninda Lahiri, Marius Paraschiv

Published 2026-04-13

📖 6 min read🧠 Deep dive

CC BY 4.0

Original authors: Tobias Meuser, Jannis Weil, Aninda Lahiri, Marius Paraschiv

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Picture: The Quantum Internet's Traffic Jam

Imagine the future "Quantum Internet." Instead of sending regular emails, this network sends quantum information (like secret keys or super-computer data) between cities. To do this, the network needs to create invisible, magical ropes called entanglements that tie two distant computers together.

However, building these ropes is incredibly difficult:

They are fragile: Like a soap bubble, if they sit too long or get bumped, they pop (lose quality).
They are probabilistic: You can't just "send" a rope and guarantee it arrives. It's like trying to toss a ball through a hoop in a hurricane; sometimes it goes in, sometimes it misses.
They can't be copied: You can't make a backup copy of a quantum rope to fix a broken one.

The problem is: How do you route these ropes? If you try to plan the whole route from a central control tower (like a GPS server), the information gets old by the time it arrives because the "traffic" (the quantum links) changes so fast. If you try to use simple rules (like "always go left"), you often get stuck in dead ends or low-quality connections.

Enter RELiQ. It's a new "smart driver" system that uses Reinforcement Learning (AI that learns by trial and error) to find the best path using only what it can see right in front of it.

The Core Idea: The "Ant Colony" Analogy

Think of the quantum network as a giant forest, and the data packets as ants looking for food (the destination).

The Old Way (Global Heuristics): Imagine a single ant sitting in a tower with a map of the entire forest. It tries to plan the perfect path. But the forest is dynamic: trees fall, new paths open, and the map is always 5 minutes old. By the time the ant starts walking, the path it planned is gone.
The Simple Way (Local Heuristics): Imagine an ant that only looks at the ground immediately under its feet. It might avoid a puddle, but it doesn't know there's a cliff 10 steps ahead. It often takes inefficient, winding paths.
The RELiQ Way (The Smart Ant): RELiQ is like an ant that has a superpower. It doesn't need a map of the whole forest. Instead, it whispers to its immediate neighbors ("Hey, the path to the left is blocked!"). Those neighbors whisper to their neighbors.
- Through this chain of whispers, the ant builds a "mental picture" of the forest without ever seeing the whole thing.
- It learns through experience: "If I go this way, I get a high-quality rope. If I go that way, the rope breaks."
- Over time, it becomes a master navigator that adapts instantly to changes.

How RELiQ Works (The Magic Sauce)

The paper introduces three key ingredients to make this work:

1. The "Whisper Network" (Message Passing)

In traditional networks, every node (computer) talks to a central boss. In RELiQ, every node only talks to its direct neighbors.

Analogy: Imagine a game of "Telephone," but instead of distorting the message, the nodes are passing along a 3D hologram of the network's health.
Each node sends a small message to its neighbors saying, "I have 3 good ropes, but my neighbor to the right is tired."
By passing these messages back and forth, every node builds a local version of the "global map" without ever needing to see the whole map.

2. The "Shape-Shifter" (Graph Neural Networks)

Most AI models are like a pair of shoes: they only fit one specific foot size (one specific network shape). If you change the network (add more cities), the AI breaks.

The Innovation: RELiQ uses a Graph Neural Network (GNN). Think of this as a pair of stretchy, magical shoes.
Whether the network has 10 nodes or 1,000 nodes, or whether the cities are arranged in a circle or a messy web, the "shoes" stretch to fit perfectly. This means the AI doesn't need to be retrained every time the network changes. It just adapts.

3. The "Reward System" (Reinforcement Learning)

The AI learns by playing a game.

Goal: Connect two points (Source and Destination) with the highest quality rope possible, as fast as possible.
The Reward: If the AI finds a path and the rope is strong, it gets a "gold star" (positive reward). If the rope breaks or the path is too long, it gets no star.
The Result: The AI quickly learns to avoid "dead ends" and "weak links," optimizing its strategy to get the most gold stars.

Why is this a Big Deal?

The researchers tested RELiQ against six other methods (some using global maps, some using simple rules) on both random networks and real-world maps (like the internet infrastructure in Germany or the UK).

The Results:

Speed: RELiQ was often the fastest at delivering data.
Quality: The connections it made were stronger (higher fidelity) than almost everyone else.
Adaptability: When the network changed (links broke, new ones formed), RELiQ didn't panic. It adjusted instantly.
Scalability: It worked just as well on a small network of 10 nodes as it did on a massive network of 1,000 nodes.

The Bottom Line

RELiQ is like a self-driving car for the Quantum Internet.

Instead of relying on a traffic control tower that is always a few minutes behind, or a driver who only looks at the bumper in front of them, RELiQ is a driver that listens to the radio chatter of nearby cars, learns from its own mistakes, and knows exactly how to navigate a chaotic, changing road to get you there safely and quickly.

It solves the biggest headache of quantum networking: How do you route fragile, disappearing connections in a world where nothing stays the same for more than a second? The answer is: Let the network talk to itself, and let AI learn the best way to listen.

1. Problem Statement

Quantum networks are essential for emerging applications like distributed quantum computing and federated machine learning, which require the exchange of quantum states via entanglement. However, routing entanglement faces significant challenges:

Dynamic Topology: Quantum links (elementary links) are probabilistic and degrade over time due to decoherence in quantum memories.
No-Cloning Theorem: Quantum states cannot be amplified, making traditional signal boosting impossible.
Information Latency: Optimal routing often requires global network topology information. However, acquiring this information globally introduces latency. By the time a central controller has a global view, the quantum topology may have changed (links degraded or failed), leading to the use of non-existent links or suboptimal paths.
Limitations of Existing Heuristics: Hand-crafted heuristics often rely on either global information (suffering from latency) or local information (leading to suboptimal performance). Furthermore, existing learning-based approaches often fail to generalize across different network topologies or require fixed node degrees.

2. Methodology: The RELiQ Framework

The authors propose RELiQ, a Multi-Agent Reinforcement Learning (MARL) framework that relies solely on local information and iterative message passing to perform entanglement routing.

A. Network Model

Graph Representation: The network is modeled as a graph $G_Q = (V, E_Q)$ where nodes are quantum repeaters and edges are optical fiber links containing elementary links (Bell pairs).
Two-Phase Operation:
1. Phase 1 (Link Generation): Nodes attempt to establish maximum elementary links with neighbors based on available qubits.
2. Phase 2 (Routing): Agents traverse the network to plan paths for source-destination pairs. Agents reserve links iteratively. Once a path is planned, repeaters perform entanglement swapping to create end-to-end entanglement.
Noise & Decay: The model incorporates realistic noise, including depolarization channels during transmission and fidelity decay in quantum memories (modeled via a stretched exponential function).

B. Reinforcement Learning Architecture

Agent Definition: An agent represents a quantum repeater responsible for routing a specific source-destination pair.
Observation Space (Local + GNN):
- Instead of using unique node identifiers (which limits generalization), RELiQ uses content-based addressing. Agents observe local link properties (fidelity, availability) and target information (is this node the destination?).
- Graph Neural Network (GNN): Agents utilize a recurrent GNN to exchange messages with one-hop neighbors. This allows nodes to iteratively aggregate local observations into a comprehensive global graph representation without a central controller. This solves the "local view" limitation while avoiding the latency of global monitoring.
Action Space: Agents select the next hop in the path. An action mask prevents loops and invalid moves (e.g., selecting non-existent links).
Reward Function: Sparse rewards are used to incentivize:
1. Success: Reaching the destination.
2. Quality: Maximizing the end-to-end fidelity ( $F_{E2E}$ ) of the established entanglement.
3. Efficiency: Minimizing resource consumption (shorter paths are favored via discounting).

C. Key Innovations over Previous Work

Generalization: Unlike prior MARL approaches (e.g., Weil et al.) that required fixed node degrees and identifier-based addressing, RELiQ handles varying node degrees and dynamic topologies without retraining.
Decentralization: It achieves global awareness through local message passing, eliminating the latency bottleneck of centralized controllers.

3. Key Contributions

Novel Framework: Introduction of RELiQ, a MARL-based routing algorithm using GNNs that outperforms both local and global heuristics in random and real-world topologies.
Scalable Generalization: Extension of existing MARL frameworks to handle graphs with varying node counts and degrees, making it applicable to real-world communication networks.
Comprehensive Evaluation: A rigorous comparison against three learning-based approaches and six heuristic algorithms (including Q-PATH, Q-LEAP, GER, LBER) across diverse parameters (network size, attenuation, gate fidelity, memory decay).

4. Results

The evaluation was conducted on random graphs and real-world topologies (e.g., Germany, UK, US, Poland).

Performance vs. Heuristics:
- RELiQ consistently achieved the highest Entanglement Distribution Rate (EDR) and End-to-End Fidelity.
- It significantly outperformed global-information heuristics (Q-PATH, Q-LEAP) in large networks because those methods suffer from stale topology information.
- It outperformed local-information heuristics (GER, MGER) by effectively aggregating global context through the GNN.
Scalability:
- As network size increased (up to 1000 repeaters), RELiQ's performance remained stable or improved, whereas global heuristics degraded due to information latency.
- RELiQ maintained high performance even with high heterogeneity in gate fidelities and memory decay rates.
Robustness:
- RELiQ handled varying attenuation constants and initial fidelities better than baselines.
- It demonstrated superior adaptability to real-world topologies (e.g., SNDlib, Topology Zoo) without retraining.
Overhead:
- Communication: RELiQ distributes the message load evenly across the network. While the total message volume is comparable to global approaches, the load per node is lower and more resilient to failures.
- Computation: While inference time is slightly higher for small networks, RELiQ scales efficiently. In large networks (>100 nodes), its distributed nature makes it faster than centralized global approaches.

5. Significance

Practical Applicability: RELiQ addresses the critical gap between theoretical routing algorithms and the dynamic, noisy reality of quantum hardware. By relying only on local information, it is deployable in scenarios where global topology monitoring is infeasible or too slow.
Generalization: The ability to train on random graphs and apply the model to unseen, real-world topologies without retraining is a major breakthrough for learning-based quantum networking.
Future-Proofing: The framework is designed to handle the inherent stochasticity and rapid changes of quantum networks, offering a robust solution for the next generation of quantum internet infrastructure.

In conclusion, RELiQ demonstrates that decentralized, learning-based routing using Graph Neural Networks can surpass both traditional heuristics and centralized learning approaches, providing a scalable and robust solution for entanglement distribution in quantum networks.