Absence of poor local minima in matrix product states

Original authors: Hao-Kai Zhang, Chenghong Zhu, Shuo Liu, Shi-Xin Zhang, Tao Xiang

Published 2026-06-10

📖 5 min read🧠 Deep dive

Original authors: Hao-Kai Zhang, Chenghong Zhu, Shuo Liu, Shi-Xin Zhang, Tao Xiang

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Problem: Getting Stuck in the Mud

Imagine you are trying to find the lowest point in a massive, foggy mountain range. This is what scientists do when they try to train quantum computers to solve problems. They use an algorithm called "gradient descent," which is like a hiker blindly feeling their way downhill, step by step, hoping to reach the very bottom (the best solution).

In most modern quantum circuits (specifically ones called "brickwork circuits"), this hiker often gets stuck in a poor local minimum.

The Analogy: Imagine the hiker is walking down a mountain but gets trapped in a small, deep valley surrounded by high walls. They think they are at the bottom because they can't go any lower, but in reality, there is a much deeper valley (the true solution) just over the next ridge.
The Result: The quantum computer gets stuck, thinks it has found the answer, but the answer is actually terrible. This is a major reason why training quantum computers is so difficult.

The Mystery: Why Do MPS Work So Well?

For decades, scientists have used a different method called Matrix Product States (MPS) to solve quantum problems. It's like a very successful, old-school hiking technique that has worked perfectly for 30 years.

The Paradox: MPS can be built using the exact same type of "steps" (quantum circuits) as the brickwork circuits that get stuck. Yet, MPS almost never gets stuck in those bad valleys. It always finds the true bottom.
The Question: Why does this specific arrangement of steps work so reliably, while others fail?

The Discovery: The "Magic Compass" (Gauge Freedom)

The authors of this paper solved the mystery. They found that MPS has a special hidden feature called gauge freedom.

The Analogy: Imagine you are navigating a maze. In a standard maze (brickwork circuits), the walls are fixed. If you hit a dead end, you are stuck.
In an MPS maze, the walls are made of sliding glass panels. You can slide these panels left or right without changing the actual path you need to take to get to the exit. This is the "gauge freedom."
The Insight: Because you can slide these panels, you can always rearrange the maze so that the part of the path you are currently looking at is over-parameterized.
- Over-parameterization is like having 100 different keys for a single lock. Even if you pick the wrong key, you have so many other options nearby that you can easily wiggle your way out of a bad spot.
- In MPS, the ability to slide the "orthogonality center" (the part of the calculation you are focusing on) means that no matter where you are, you can always rearrange the view so that you have too many keys for the lock. This creates a "safe zone" where the landscape is smooth and convex, making it impossible to get stuck in a bad valley.

The Proof: It's All About the View

The paper proves two main things mathematically:

The View Doesn't Matter: Whether you look at the MPS from the left, the right, or the middle (moving the orthogonality center), the statistical "map" of the landscape looks exactly the same. The bad valleys don't appear just because you changed your perspective.
The "Good" Valleys: Because of this sliding ability, the "bad valleys" (poor local minima) are mathematically forced to concentrate right next to the "true bottom" (the global minimum).
- The Analogy: In a bad circuit, the bad valleys are scattered everywhere like landmines. In an MPS circuit, the bad valleys are all clustered together right next to the treasure chest. So, even if you think you found a "bad" spot, you are actually standing right next to the solution.

The Experiment: The Race

To prove this, the authors ran a race between three types of circuits:

Sequential Circuits (MPS): The "sliding panel" method.
Brickwork Circuits: The standard, rigid method.
Sloping Brickwork Circuits: A hybrid version.

They gave them all a random, difficult mountain range to climb (random Hamiltonians).

The Result: The Sequential (MPS) circuits always found the bottom. The Brickwork circuits got stuck in the shallow, bad valleys, especially as the mountains got bigger.

The Takeaway

The paper concludes that the secret to making quantum algorithms trainable isn't just making the circuits bigger or deeper. It's about structure.

By using a structure (MPS) that allows for "sliding panels" (gauge freedom), you create a situation where the computer is effectively "over-equipped" with options at every single step. This ensures that the computer never gets truly stuck in a bad spot, making it a much more reliable tool for solving quantum problems.

In short: MPS works because it has a built-in "undo" button that lets it rearrange its own path to avoid getting stuck, ensuring it always finds the best solution.

Problem Statement
Variational Quantum Algorithms (VQAs) face severe trainability issues, particularly the prevalence of "poor local minima" in the energy landscapes of parametrized quantum circuits. While deep circuits suffer from barren plateaus (vanishing gradients), shallow circuits are often plagued by local minima with high loss values that prevent convergence to the ground state. This is notably observed in brickwork circuits and quantum convolutional neural networks. Conversely, Matrix Product States (MPS), which can be prepared by sequential quantum circuits, have demonstrated remarkable trainability in practice for decades (e.g., via Density Matrix Renormalization Group, DMRG), routinely converging to ground states even from random initialization. This creates an apparent paradox: why are sequential circuits (MPS) free from poor local minima while structurally similar shallow circuits (brickwork) are not?

Methodology
The authors resolve this paradox by analyzing the geometric and statistical properties of the MPS energy landscape. Their approach combines rigorous theoretical proofs with numerical experiments:

Gauge Freedom and Causal Structure: The authors leverage the gauge freedom inherent in MPS representations. By inserting an invertible matrix and its inverse between adjacent tensors, the physical state remains unchanged, but the "orthogonality center" can be moved. This movement alters the causal structure of the corresponding sequential circuit without changing the physical state.
Ensemble Equivalence (Theorem 1): Using Weingarten calculus, the authors prove that random MPS ensembles generated by sequential circuits with orthogonality centers at different sites are statistically identical. This implies that the induced probability distribution over physical states is independent of the orthogonality center's position.
Invariance of Local Minimum Distribution (Theorem 2): The authors define a "local minimum distribution" based on the probability of converging to specific minima via optimization dynamics (specifically the Time-Dependent Variational Principle, TDVP, equivalent to Quantum Natural Gradient Descent). They prove that for any given Hamiltonian, the distribution of local minimum energy values is invariant under moves of the orthogonality center.
Effective Local Overparametrization: By choosing a convenient gauge (moving the orthogonality center near the support of a local Hamiltonian term), the authors demonstrate that the backward causal cone of the circuit becomes effectively overparametrized. When the bond dimension $D$ exceeds a critical value $D_c$ , the number of independent parameters in the causal cone exceeds the Hilbert space dimension of the subsystem. This transforms the local optimization problem into a convex one (optimization over a Bloch ball), where all local minima are global minima.
Compatible Gradient Condition: For generic Hamiltonians with multiple local terms, the authors propose a "compatible gradient condition." If the gradients of individual sub-terms do not strongly frustrate each other (i.e., their inner products are non-negative or weakly negative), the total energy landscape inherits the benign properties of the individual terms, ensuring local minima concentrate near the global minimum.
Numerical Validation: The authors simulate random "backward-evolved" Hamiltonians using sequential, brickwork, and sloping brickwork circuits. They compare the convergence of these architectures using the Adam optimizer.

Key Results

Theoretical Proof: The paper rigorously proves that the local minimum distribution of MPS energy landscapes is invariant under the movement of the orthogonality center.
Absence of Poor Local Minima: Under the condition of effective local overparametrization (sufficient bond dimension), the optimization of sequential circuits (MPS) is shown to be free from poor local minima. The local minima concentrate near the global minimum.
Contrast with Brickwork Circuits: Numerical experiments confirm that while sequential circuits reliably converge to near-optimal solutions for random Hamiltonians, brickwork and sloping brickwork circuits of comparable parameter counts frequently get trapped in poor local minima, with errors increasing as system size grows.
Gradient Compatibility: Numerical tests on XXZ and Heisenberg Hamiltonians show that the "compatible gradient condition" is typically satisfied, supporting the theoretical prediction that local minima concentrate near the ground state energy.

Significance
The paper identifies "effective local overparametrization," arising from the gauge structure of MPS, as the pivotal factor determining trainability. This provides a theoretical explanation for the empirical success of MPS-based algorithms like DMRG, distinguishing them from other shallow quantum circuits that suffer from poor local minima.

The findings suggest that to enhance the trainability of variational quantum algorithms, a preferable strategy is to increase the size of universal unitary blocks (increasing local overparametrization) rather than simply stacking layers to increase depth. Furthermore, the techniques and conclusions are posited to generalize to other tensor network states with movable orthogonality centers, such as tree tensor networks, offering a guide for overcoming trainability bottlenecks in variational quantum algorithms beyond the scope of global overparametrization.