Expressibility of neural quantum states: a Walsh-complexity perspective

Imagine you are trying to teach a computer to mimic the behavior of a complex quantum system, like a swarm of interacting particles. In the world of physics, these systems are described by something called a "wavefunction." To do this, scientists use Neural Quantum States (NQS)—basically, artificial neural networks designed to guess what these particles are doing.

The big question this paper asks is: "How good are these neural networks at guessing?"

Specifically, the authors want to know: Can a simple, shallow neural network describe a complex quantum state, or do we need a very deep, complicated one?

Here is the breakdown of their discovery, using simple analogies.

1. The Problem: The "Hidden Complexity" Trap

Usually, physicists judge how hard a quantum state is to describe by looking at entanglement (how much the particles are "tangled" together).

The Old Belief: If particles are only tangled with their immediate neighbors (short-range entanglement), the state should be easy to describe.
The Surprise: The authors found a specific quantum state (a "dimerized state") that looks very simple. The particles are only tangled with their neighbors, and it can be described by a very short, simple formula.
The Twist: Despite looking simple, this state is a nightmare for certain types of neural networks (called "additive" networks). It's like a puzzle that looks like a picture of a cat, but to solve it, you have to rearrange every single piece in a way that defies logic.

2. The New Tool: "Walsh Complexity" (The Flavor Spectrum)

To measure this hidden difficulty, the authors invented a new ruler called Walsh Complexity.

The Analogy: The Cocktail Party
Imagine a huge party where everyone is shouting a different combination of words.

Simple State: Everyone is shouting the same phrase, or just a few distinct phrases. You can easily predict what the crowd sounds like.
Complex State (The Target): The crowd is shouting a chaotic mix of every possible combination of words, with equal volume. It's a "flat spectrum" of noise.

Walsh Complexity measures how "spread out" the noise is.

If the noise is concentrated in a few patterns, the complexity is low.
If the noise is spread evenly across all possible patterns (like our chaotic party), the complexity is maximal.

The authors found that their "simple" dimerized state has Maximal Walsh Complexity. It's a "flat spectrum" state.

3. The Limitation: The "Shallow Network" Bottleneck

The paper tests two types of neural networks:

Multiplicative Networks: These build the answer by multiplying factors together (like stacking Lego bricks).
Additive Networks: These build the answer by adding up signals (like mixing ingredients in a bowl). This is the standard way modern AI (like Transformers) works.

The Finding:

Multiplicative networks are great at handling this "flat spectrum" chaos. They can describe the simple dimer state easily.
Additive networks (the standard ones) struggle immensely.
- If the network is shallow (few layers) and uses standard math functions (like polynomials), it physically cannot generate enough "Walsh Complexity" to match the target. It's like trying to paint a masterpiece using only a single drop of paint; you simply don't have enough "ink" to cover the canvas.
- The network needs to get deeper (more layers) to build up enough complexity. The paper proves that for these networks to succeed, the depth must grow logarithmically with the size of the system (roughly, if you double the particles, you need a few more layers).

4. The "Saturation" Switch

The authors also looked at what happens when the neural network uses "saturated" activation functions (like the tanh function, which squashes numbers between -1 and 1).

The Analogy: Imagine a light switch. In the "tame" regime, the switch is dim and adjustable. In the "saturated" regime, the switch is either fully ON or fully OFF.
The Result: Once the network pushes its internal signals into this "ON/OFF" (threshold) mode, the rules change. The network suddenly becomes much more powerful, almost like a super-computer that can solve hard logic puzzles instantly.
The Catch: While this makes the network powerful, it also makes it impossible to prove mathematically that it can't solve a problem. It's like a magician who can pull a rabbit out of a hat; you know it's possible, but you can't easily explain how the trick works or prove a limit to what they can do.

Summary: What Does This Mean?

Entanglement isn't everything: A quantum state can look simple (low entanglement) but be incredibly hard for standard AI to learn because of its "hidden spectrum" (Walsh complexity).
Depth is a resource: For standard additive neural networks, depth (number of layers) is the key to unlocking complex states. You can't just throw more "width" (more neurons) at the problem; you need more layers to build up the necessary complexity.
The "Tame" vs. "Wild" regimes:
- In the "Tame" regime (standard math, no saturation), we can mathematically prove exactly what these networks cannot do.
- In the "Wild" regime (saturated, threshold-like behavior), the networks become so expressive that it's very hard to predict their limits, which explains why modern AI seems to work so magically well on hard problems.

The Bottom Line:
This paper gives physicists a new "ruler" (Walsh Complexity) to measure how hard a quantum state is for AI to learn. It warns us that just because a state looks simple, it might be a "trick question" for standard neural networks, requiring deeper architectures to solve.

1. Problem Statement

Neural Quantum States (NQS) have emerged as powerful variational wavefunctions for simulating many-body quantum systems. However, a quantitative theory explaining which many-body states can be efficiently represented by modern additive neural network architectures (e.g., feed-forward networks, transformers) remains incomplete.

While entanglement entropy is often used as a proxy for representability, the authors argue it is insufficient for additive models without built-in geometric locality. For instance, shallow additive networks can support volume-law entanglement, yet they may fail to represent states with simple tensor-network descriptions (like Matrix Product States) if the state's structure is incompatible with the additive architecture. The paper seeks to identify a new metric and theoretical framework to determine the expressibility limits of additive NQS.

2. Methodology: Walsh Complexity

The authors introduce Walsh complexity, a basis-dependent measure derived from the Walsh-Hadamard transform, to quantify how broadly a wavefunction is spread over parity patterns in the conjugate basis.

Definition: For a normalized wavefunction $\psi(\sigma)$ , they define a rescaled function $f(\sigma) = 2^{N/2}\psi(\sigma)$ on the Boolean cube $\{\pm 1\}^N$ . The Walsh complexity is defined as the $\ell_1$ norm of the Walsh coefficients:
$\|f\|_W \equiv \sum_{S \subseteq [N]} |\hat{f}(S)|$
where $\hat{f}(S)$ are the Walsh coefficients.
Physical Interpretation: $\|f\|_W$ relates to the Rényi-1/2 entropy of the X-basis outcome distribution. A value of $\|f\|_W = 1$ indicates a state concentrated on a single parity mode (e.g., a product state), while $\|f\|_W = 2^{N/2}$ indicates a "flat" spectrum where weight is uniformly distributed across all parity patterns (maximal complexity).
Approximation Bound: The authors establish a fundamental inequality for approximation:
$|\langle f, g \rangle| \leq \|\hat{f}\|_\infty \|g\|_W$
This implies that to approximate a target state $f$ with a small maximum coefficient $\|\hat{f}\|_\infty$ (a "flat" spectrum), the approximating function $g$ must possess a correspondingly large Walsh complexity $\|g\|_W$ .

3. Key Contributions

A. The "Walsh-Hard" Benchmark State

The authors construct a specific benchmark state, $|\psi_{XZ}\rangle$ , prepared by a single layer of disjoint Controlled-Z (CZ) gates acting on a product state.

Properties: This state has only short-range entanglement (dimerized) and admits an exact, low-bond-dimension Matrix Product State (MPS) description.
The Paradox: Despite its simplicity in the tensor-network and entanglement sense, its coefficient pattern corresponds to a quadratic bent function (Inner Product mod 2). Consequently, it has a perfectly flat Walsh spectrum, saturating the maximum complexity $\|f_{XZ}\|_W = 2^{N/2}$ .
Significance: This proves that entanglement and tensor-network simplicity are misleading proxies for the expressibility of additive NQS.

B. Theoretical Bounds in the "Tame" Regime

The authors derive a Tame-Majorant Bound for additive feed-forward networks.

Mechanism: They analyze the propagation of Walsh mass through the network layers using the absolute Taylor majorant of the activation function $\eta$ .
Result: For "tame" regimes (e.g., fixed-degree polynomial activations with subexponential parameter scaling), the Walsh complexity of a depth- $D$ network grows at most as:
$\|g\|_W \lesssim K^{O(p^{D-1})}$
where $K$ depends on weights and biases.
Implication: If the depth $D$ is constant or grows slower than $\log N$ , the network cannot generate the exponential Walsh complexity ( $2^{N/2}$ ) required to approximate the benchmark state $|\psi_{XZ}\rangle$ . This establishes a depth-dependent obstruction: successful fitting requires depth scaling logarithmically with system size ( $D \sim \log N$ ).

C. Analysis of Bounded Activations (Threshold Regime)

The authors investigate what happens when bounded activations (like $\tanh$ ) enter saturation.

Transition: Once preactivations saturate, the network effectively computes using threshold gates, moving the problem into the realm of constant-depth threshold circuits ( $TC^0$ ).
Difficulty: While the "tame" bound provides a sharp ceiling, the $TC^0$ regime is notoriously difficult to analyze due to the lack of explicit superpolynomial lower bounds for general Boolean functions (related to the "Natural Proofs" barrier).
Observation: In this regime, additive NQS can appear extraordinarily expressive in practice, as they can approximate complex Boolean functions that are hard to characterize analytically.

4. Results

The authors validate their theory through numerical experiments fitting the Walsh-hard target $f_{XZ}$ across varying system sizes ( $N$ ) and network depths ( $D$ ) with linear width $w=2N$ .

Polynomial Activations:
- For degree-2 polynomial activations, the network fails to fit the target at low depths.
- Successful fitting (accuracy $\approx 1$ ) only emerges once the depth reaches a logarithmic scale ( $D \approx \log N$ ).
- The Walsh complexity of the readout grows in tandem with depth, only reaching the required $O(N)$ scale (exponential in $N$ ) beyond this threshold.
Bounded Activations ( $\tanh$ ):
- A sharp, threshold-like transition is observed.
- Depth $D=2$ fails for moderate $N$ .
- Depth $D=3$ achieves exact fitting across the entire range.
- This aligns with circuit complexity theory: the target function (Inner Product mod 2) can be constructed by a depth-3 threshold circuit, whereas depth-2 circuits have known limitations for this specific function.

5. Significance and Conclusion

New Expressibility Axis: The paper establishes Walsh complexity as a complementary and often more rigorous metric than entanglement for analyzing additive NQS. It highlights that a state can be "simple" (low entanglement) yet "hard" for additive networks (high Walsh complexity).
Depth as a Resource: It clarifies that for additive architectures, depth is an essential resource. Shallow networks are fundamentally limited in representing states with flat Walsh spectra unless they operate in a saturated, threshold-like regime.
Regime Distinction: The work distinguishes between:
1. The Tame Regime: Where rigorous subexponential ceilings can be proven, and depth is strictly required.
2. The Saturated/Threshold Regime: Where networks behave like $TC^0$ circuits. Here, expressibility is high but analytically opaque due to the difficulty of proving lower bounds for threshold circuits.
Practical Implication: For practitioners using additive NQS (e.g., Transformers, MLPs), the results suggest that if the target state has a flat Walsh spectrum (even if low entanglement), increasing depth is necessary. Conversely, if the network saturates, it may gain expressibility but lose the ability to be theoretically bounded by simple spectral arguments.

In summary, this paper provides a rigorous mathematical framework linking circuit complexity, Walsh analysis, and neural network expressibility, offering a clear explanation for why certain quantum states are difficult to represent with shallow additive networks despite having low entanglement.

Expressibility of neural quantum states: a Walsh-complexity perspective

1. The Problem: The "Hidden Complexity" Trap

2. The New Tool: "Walsh Complexity" (The Flavor Spectrum)

3. The Limitation: The "Shallow Network" Bottleneck

4. The "Saturation" Switch

Summary: What Does This Mean?

1. Problem Statement

2. Methodology: Walsh Complexity

3. Key Contributions

A. The "Walsh-Hard" Benchmark State

B. Theoretical Bounds in the "Tame" Regime

C. Analysis of Bounded Activations (Threshold Regime)

4. Results

5. Significance and Conclusion

More like this

Non-reciprocal Ising gauge theory

Enhanced Kadowaki-Woods Ratio and Weak-Coupling Superconductivity in Noncentrosymmetric YPt2_22​Si2_22​ Single Crystals

Anatomy of a Complex Crystallization Pathway

Shear Banding in Simulations of Polymer Melts

Detection of Spin-Spatial-Coupling-Induced Dynamical Phase Transitions in Real Time

Enhanced Kadowaki-Woods Ratio and Weak-Coupling Superconductivity in Noncentrosymmetric YPt $_2$ Si $_2$ Single Crystals