Original authors: Nabil Iqbal, T. Anderson Keller, Yue Song, Takeru Miyato, Max Welling
Original authors: Nabil Iqbal, T. Anderson Keller, Yue Song, Takeru Miyato, Max Welling
Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer
Technical Summary: Spontaneous Symmetry Breaking and Goldstone Modes for Deep Information Propagation
Problem Statement
The flow of information through the layers of deep neural networks (DNNs) and over the timesteps of recurrent networks is a fundamental challenge in deep learning. In standard architectures, information propagation is often unstable: networks either collapse to a single attractor (removing input information) or exhibit chaotic behavior that decorrelates inputs from outputs. While techniques like residual connections, normalization (e.g., LayerNorm), and gating mechanisms (e.g., in GRUs/LSTMs) have been developed to mitigate these issues, they are architectural heuristics rather than solutions derived from first principles of information stability.
This paper investigates whether principles from statistical physics, specifically spontaneous symmetry breaking (SSB) and the resulting Goldstone modes, can provide a mechanism for stable, coherent information propagation across deep layers and recurrent iterations without relying on these standard stabilizers.
Methodology
Theoretical Framework
The authors propose a framework where the internal layers of a neural network are constructed to be equivariant under a continuous symmetry group G (specifically U(1) and O(k)).
- Equivariant Layers: For a layer fl acting on a representation xl, the layer satisfies ρgfl(xl)=fl(ρgxl) for all g∈G, where ρg is the representation of the symmetry group.
- Input/Output: The input and output layers are fully general and break the equivariance, while the "bulk" of the network preserves it.
- Non-linearity: The activation functions are chosen to be equivariant (e.g., radial non-linearities like ϕ(z)=tanh(∣z∣)∣z∣z for U(1)).
Analytical Approach
Using tools from mean-field theory and stochastic path integrals (extending the work of [9–12]), the authors analyze the network dynamics at initialization in the large-N limit (where N is the width of the network).
- Order Parameter: They define an order parameter cl representing the mean magnitude of activations at layer l.
- Phase Transition: They identify two phases:
- Unbroken Symmetry Phase (σW<1): Activations collapse to zero (cl→0). Information is lost.
- Spontaneously Broken Symmetry (SSB) Phase (σW>1): Activations settle on a non-zero magnitude (cl>0).
- Goldstone Modes: In the SSB phase, the network possesses a degree of freedom analogous to a Goldstone mode. Specifically, the phase of the complex representation (or the orientation in O(k) space) is preserved across layers. The authors derive that the phase of the covariance between two inputs, ϕl, remains constant (ϕl+1=ϕl) regardless of depth.
- Jacobian Protection: They show that a specific component of the input-output Jacobian, related to the symmetry transformation, remains O(1) in the SSB phase. This contrasts with vanilla networks where Jacobians typically vanish or explode exponentially with depth.
Empirical Approach
The authors validate these theoretical claims through experiments on:
- Feedforward Networks: Training deep Multi-Layer Perceptrons (MLPs) on Fashion-MNIST and MNIST with varying depths (up to 100 layers) and symmetry groups (U(1), O(4)).
- Recurrent Networks: Implementing U(1) and O(k) equivariant RNNs and GRUs.
- Tasks:
- Variable-Delay Copy Task: A synthetic task requiring the network to store a sequence and reproduce it after a variable delay T.
- Permuted Sequential MNIST (psMNIST): A pixel-by-pixel classification task with shuffled pixel order to eliminate short-range spatial correlations, forcing reliance on long-range memory.
Key Contributions
- Identification of Goldstone-like Modes in DNNs: The paper demonstrates that neural networks with internal equivariant layers support degrees of freedom (specifically phase/orientation) that propagate coherently across depth, analogous to Goldstone modes in physics.
- Stable Information Propagation without Heuristics: The authors show that in the SSB phase, deep networks can be trained effectively without architectural stabilizers such as skip connections, LayerNorm, or BatchNorm. The symmetry itself provides a "protected channel" for information flow.
- Analytical Characterization of the SSB Phase: They provide a mean-field derivation showing that the transition to the SSB phase occurs at a critical weight initialization variance (σW=1) and that this phase supports non-vanishing Jacobian components and sustained correlations.
- Performance Gains in Recurrent Settings: The mechanism is shown to significantly improve the performance of RNNs and GRUs on long-sequence modeling tasks, outperforming non-equivariant baselines even when the baselines have more trainable parameters.
Results
- Phase Transition: Empirical results on MLPs confirm the theoretical phase transition at σW=1. Training performance improves dramatically only when the network enters the SSB phase (σW>1), as measured by the order parameter c∗.
- Depth Scalability: Equivariant networks maintain high test accuracy on Fashion-MNIST as depth increases to 100 layers, whereas generic (non-equivariant) networks with the same non-linearity and no stabilizers fail to train.
- Jacobian Stability: In the SSB phase, the "protected" component of the Jacobian remains O(1) throughout training, whereas the full Jacobian of generic networks collapses.
- Recurrent Memory:
- On the variable-delay copy task (Tmax=100), U(1)-equivariant GRUs significantly outperform non-equivariant GRUs, achieving lower loss with fewer real parameters (6k vs 15k).
- On psMNIST, equivariant RNNs and GRUs consistently outperform generic counterparts across all parameter ranges. Notably, an O(4)-equivariant simple RNN (without gating) achieves performance comparable to gated GRUs.
- Topological Defects: In 2D convolutional RNN experiments, the authors observe the emergence of long-lived vortices (topological defects) in the hidden state phase, suggesting a potential secondary mechanism for memory storage, though this is presented as preliminary.
Significance and Claims
The paper claims that spontaneous symmetry breaking offers a new, principled mechanism for deep information propagation. By enforcing equivariance in internal layers, the network naturally supports Goldstone-like modes that carry information coherently over long distances (depth) and times (recurrent steps).
The significance lies in:
- Reducing Architectural Complexity: It suggests that very deep networks can be trained without the complex suite of normalization and residual connections currently standard in the field, provided the symmetry-breaking condition is met.
- Bridging Physics and Deep Learning: It establishes a concrete link between the physics of broken continuous symmetries and the trainability of deep neural networks, moving beyond the "edge of chaos" paradigm.
- Enhanced Long-Range Memory: The mechanism provides a robust solution for long-term memory in recurrent networks, addressing a known weakness of standard RNNs.
The authors remain modest, noting that their experiments are currently limited to simple benchmarks and that the precise role of topological defects requires further study. They frame the work as a demonstration of a new use of equivariance—not for task symmetry, but as an architectural tool for information propagation.
Drowning in papers in your field?
Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.
Get the best condensed matter papers every week.
Trusted by researchers at Stanford, Cambridge, and the French Academy of Sciences.
Check your inbox to confirm your subscription.
Something went wrong. Try again?
No spam, unsubscribe anytime.