Attention-based optimizer for symmetry finding

Original authors: Shreya Banerjee, Vinodh Raj Rajagopal Muthu, Charlie Nation, Rick P. A. Simon, Francesco Martini, Alessandro Ricottone, Federico Cerisola, Luca Dellantonio

Published 2026-06-01

📖 5 min read🧠 Deep dive

View on arXiv ↗PDF ↗

CC BY 4.0

Original authors: Shreya Banerjee, Vinodh Raj Rajagopal Muthu, Charlie Nation, Rick P. A. Simon, Francesco Martini, Alessandro Ricottone, Federico Cerisola, Luca Dellantonio

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to solve a massive, incredibly complex puzzle. This puzzle represents a physical system, like a collection of atoms or particles interacting with each other. In the world of physics, these interactions are described by something called a "Hamiltonian."

Usually, to understand these systems, scientists look for symmetries. Think of a symmetry like a hidden rule or a pattern that stays the same no matter how you rearrange the pieces. If you find this rule, the puzzle becomes much easier to solve because you can ignore a lot of the confusing details.

For a long time, finding these hidden rules was like searching for a needle in a haystack using a very slow, methodical, and rigid process. If the haystack was huge (which it often is in quantum physics), this method took forever.

The New Approach: A "Smart" Search Engine

In this paper, the authors introduce a new tool that uses Artificial Intelligence (AI) to find these symmetries much faster. They call it an "Attention-based Optimizer."

Here is how it works, using some everyday analogies:

1. The Problem: A Crowd of Chattering People

Imagine the Hamiltonian is a room full of people (the "Pauli-Strings") all talking at once. You need to find one specific person (the "Symmetry") who can stand in the corner and listen to everyone without interrupting or getting confused. In physics terms, this person must "commute" with everyone else, meaning their presence doesn't change the conversation.

The old way of finding this person was to check every single person against every other person one by one. It was thorough but painfully slow.

2. The Solution: The "Set-Transformer" (The Super-Listener)

The authors built a machine learning model called a Set-Transformer. Think of this model as a super-intelligent listener who doesn't just hear words, but understands the relationships between them.

Self-Attention: Just like how you can listen to a group of friends and instantly notice who is agreeing with whom, or who is arguing, this AI uses "self-attention." It looks at all the "people" in the room simultaneously and figures out how they relate to each other.
No Order Matters: In a normal conversation, the order of words matters. But in this puzzle, the order of the particles doesn't matter. The AI is designed to understand that the group is the same whether you list the people from left to right or right to left. This is crucial for solving the physics puzzle correctly.

3. The Training: Learning by Trial and Error

The AI doesn't know the answer at the start. It makes a guess about who the "Symmetry" person is.

The Scorecard (Loss Function): The system checks the guess. If the guessed person interrupts the conversation (doesn't commute), the score is bad. The AI gets a "penalty" and tries again.
The Hurdles: The AI has to avoid two traps:
1. The "Do Nothing" Trap: It can't just guess that "silence" (the Identity) is the answer, because that's a boring, useless symmetry. The system forces it to find a real, active pattern.
2. The "Maybe" Trap: The AI initially gives vague answers (like "50% sure"). The system pushes it to make a firm decision (either "Yes, this is the symmetry" or "No").

4. The "Adaptive Context Expansion" (The Magic Boost)

Sometimes, the AI gets stuck. It's like a detective who has looked at all the clues in the room but can't solve the case because the clues are too sparse or confusing. The AI might get stuck in a "local minimum"—a spot where it thinks it's doing okay, but it's actually far from the real answer.

To fix this, the authors added a feature called Adaptive Context Expansion (ACE).

The Analogy: Imagine the detective realizes, "I'm stuck. I need more clues." So, the system magically creates new clues by combining existing ones (mathematically multiplying two "people" to create a new "person").
The Result: This gives the AI a fresh perspective and a "kick" to jump out of the stuck spot and keep searching. It effectively expands the room so the AI can see more connections.

What Did They Find?

The authors tested this new AI detective on three types of puzzles:

Random Puzzles: They made up random, messy Hamiltonians. Here, the AI was fast, but it needed a lot of computer power (many "starts" or attempts) to succeed, especially when the puzzles were very complex. It was like searching for a needle in a haystack that was constantly changing shape.
Real-World Physics Puzzles (Ising Models & Toric Code): These are models that describe real magnetic materials and quantum error-correcting codes.
- The Big Win: For these real-world systems, the AI was incredibly fast—hundreds or even thousands of times faster than the old, rigid methods.
- Why? Real physical systems have structure. They aren't random chaos; they have repeating patterns (like a grid of magnets). The AI's "super-listening" ability is perfect for spotting these patterns immediately. It didn't even need to use the "Magic Boost" (ACE) very often because the clues were already very clear.

The Bottom Line

This paper presents a new way to use AI to find hidden rules in complex physical systems. Instead of checking every possibility one by one (which is slow), the AI looks at the whole picture at once, learns the relationships, and finds the answer much faster.

For random, messy problems: It works well but needs a lot of computing power.
For real-world physical problems: It is a game-changer, finding solutions almost instantly compared to traditional methods.

The authors suggest this is the first time machine learning has been used to directly find symmetries from a raw physical model, opening the door to solving even harder physics problems in the future.

Problem Statement

Finding symmetries in physical systems is fundamental to understanding and solving complex models, particularly in quantum many-body physics. While modern computational methods allow for direct study of complicated problems, many remain intractable for brute-force numerical implementations (e.g., exact diagonalization). Although approximation techniques like tensor networks exist, they often rely on specific structural assumptions that degrade when the physical system does not match those assumptions.

Existing algorithms for finding symmetries, such as the deterministic method presented in Refs. [38–40], can taper off qubits by finding a reference frame where symmetries are stabilized. However, while classically efficient (cubic runtime), these deterministic approaches suffer from long time-scales for systems with a large number of qubits. Furthermore, they guarantee finding all generators of symmetries but may be computationally expensive for large systems. There is a need for a method that can rapidly identify Pauli symmetries directly from an input Hamiltonian without prior knowledge of the system's structure, particularly for physical systems where symmetries are not immediately obvious.

Methodology

The authors propose a machine-learning-based optimization framework that merges automated symmetry finding with deep learning. The core of the framework is a Set-Transformer architecture, chosen because the problem of finding a Pauli symmetry is intrinsically permutation-invariant (the order of Pauli-Strings in a Hamiltonian does not matter).

1. Input Representation:
The input Hamiltonian $H = \sum P_i$ is represented as a tableau $H_t$ , a binary matrix where each row corresponds to a Pauli-String encoded as a $2n_q$ -dimensional binary vector (using symplectic formalism). This representation preserves the permutation-invariance of the input.

2. Architecture:
The model consists of three main components:

Input Embedding and Projection: The binary tableau rows are projected into a continuous, learnable latent space using a linear layer. Position embeddings are avoided to maintain permutation invariance.
Set-Transformer (Encoder-Decoder):
- Encoder: Uses stacked Set Attention Blocks (SAB) containing Multi-Head Attention (MHA) and row-wise Feed-Forward (rFF) layers. The self-attention mechanism encodes pairwise and higher-order correlations among the Pauli-Strings.
- Decoder: Projects the learned correlations into a single candidate symmetry vector. It includes a Pooling Multi-head Attention (PMA) layer, a SAB, layer normalization, and a linear layer to map the latent dimension back to $2n_q$ .
- Activation: A Sin layer followed by a learnable Sigmoid layer maps the continuous output to approximate binary values (0 and 1), representing the candidate Pauli symmetry $S_p$ .
Adaptive Context Expansion (ACE): To address the issue of exponentially many non-solutions compared to solutions (especially in random Hamiltonians), the framework includes an ACE module. If the optimizer appears stuck in a local minimum (detected via oscillating loss), ACE synthetically expands the context by adding products of existing Pauli-Strings ( $P_i P_j$ ) to the Hamiltonian. This provides new information to help the optimizer escape local minima.

3. Optimization Objective:
The framework minimizes a custom loss function $C$ composed of four terms:

Commutation Loss ( $C_{com}$ ): The primary objective, ensuring the candidate $S(\theta)$ commutes with all terms in $H$ . It uses a differentiable proxy, $\sin^2(\frac{\pi}{2} x)$ , for the modulo-2 commutator condition.
Zero-Penalty ( $C_{zp}$ ): Prevents the trivial solution (the Identity operator) by penalizing outputs where all elements are zero.
Binary Penalty ( $C_{bin}$ ): Encourages the continuous output values to converge to binary values (0 or 1).
Linearity Regularizer ( $C_{lin}$ ): Assists early optimization by favoring candidates that anti-commute a limited number of times, mitigating the multi-modal nature of the commutation loss landscape.

The optimization is performed using the AdamW optimizer with early stopping conditions that verify if a valid symmetry has been found.

Key Contributions

First ML-based Symmetry Finder: To the authors' knowledge, this is the first work to use machine learning and artificial intelligence to find symmetries directly from an input Hamiltonian without prior knowledge of the system or the symmetry.
Set-Transformer Architecture: The application of Set-Transformers to encode correlations among Pauli-Strings, treating them analogously to tokens in natural language processing, to extract global relations.
Adaptive Context Expansion: A novel module that dynamically increases the input context to help the optimizer navigate complex loss landscapes where solutions are sparse.
Probabilistic Speedup: The framework offers a probabilistic approach that finds symmetries significantly faster than deterministic alternatives for specific physical systems, trading deterministic guarantees for speed.

Results

The framework was benchmarked on three categories of Hamiltonians:

1. Random Pauli Hamiltonians:

Tested on 10-qubit systems with varying ranks ( $R$ ).
The attention-based optimizer found symmetries faster than the deterministic algorithm for ranks $R=4$ to $16$.
For higher ranks, the time complexity scales as $O(2^{0.705R})$ for the minimal time, compared to $O(2^R)$ for the deterministic algorithm up to $R=8$ .
Success probability decreases with rank, requiring more parallel starts (and thus more GPUs) to achieve a 90% success rate. For $R=18$ , 32 parallel starts were estimated to be necessary.

2. Periodic 1-D Transverse-Field Ising Model:

Tested on systems with $n_q$ ranging from 10 to 1400.
The GPU implementation of the framework found symmetries approximately 225 times faster than the deterministic approach, while the CPU implementation was 1500 times faster.
The number of iterations required by the optimizer remained approximately constant (saturating around 35–40) as system size increased, whereas the number of Clifford gates for the deterministic algorithm grew polynomially.
Failure probability was extremely low (average $p_f \approx 0.033$ ).

3. 2-D Ising Ladder and Toric Code:

Applied to 2-D Ising ladders and Toric codes (with and without magnetic fields) up to $n_q = 1000$ .
The framework showed a substantial advantage over the deterministic algorithm, with the GPU implementation being roughly $10^5$ times faster for the Ising ladder.
For Toric codes, the advantage increased with system size. The deterministic algorithm's scaling was observed to be worse than the expected $O(n_q^3)$ , likely due to the moderate number of Pauli-Strings.
The optimizer achieved high success rates with low failure probabilities across all tested geometries.

Observation on Physical vs. Random Systems:
The paper notes that the framework performs exceptionally well on physical Hamiltonians (Ising, Toric) because their tableau representations encode ordered, local, and repetitive physical interactions. This structure makes the context immediately informative, allowing the optimizer to navigate the loss landscape easily. In contrast, random Hamiltonians lack this regularity, requiring more computational resources (context expansion and parallel starts) to find symmetries.

Significance and Claims

The authors claim that this work represents an important step toward extending machine learning to find other classes of symmetries where no optimal or deterministic strategies are known. By merging machine learning with automated symmetry finding, the framework offers a "substantial advantage" in speed for physical Hamiltonians compared to state-of-the-art deterministic strategies.

The paper modestly frames its contribution as a proof-of-concept for using attention mechanisms to solve algebraic problems in quantum physics. It highlights that while the method is probabilistic and requires parallelization for random systems, it is highly effective for physical models where systemic interactions are embedded in the Hamiltonian. The authors plan to extend this approach to find other symmetry classes, such as Clifford symmetries, in future work.