Preserving Continuous Symmetry in Discrete Spaces: Geometric-Aware Quantization for SO(3)-Equivariant GNNs

Imagine you are trying to build a digital model of a molecule, like a tiny, complex Lego structure made of atoms. To make this model useful for scientists, it needs to follow the strict laws of physics. One of the most important laws is symmetry: if you pick up the molecule and spin it around, the physics shouldn't change. A spinning top looks the same from the side as it does from the front; the forces holding it together shouldn't suddenly break just because you turned it.

In the world of Artificial Intelligence, we use special neural networks called SO(3)-Equivariant GNNs to learn these rules. They are like master chefs who know exactly how to cook a dish regardless of which way the kitchen is facing.

However, there's a big problem: these "master chef" networks are incredibly heavy and slow. They require massive amounts of computer memory and power, making them too expensive to run for long simulations (like watching a molecule dance for a whole second).

To fix this, engineers usually try to compress the model, much like zipping a large file to make it smaller. This is called quantization. It's like taking a high-definition photo and shrinking it to a low-resolution thumbnail to save space.

The Problem with "Naive" Compression
The paper explains that if you just shrink these physics models using standard methods (like "Naive Quantization"), you break the laws of physics.

Think of it this way: Imagine you are describing a wind direction using a map with a rigid grid of North, South, East, and West. If the wind is blowing slightly Northeast, a standard grid might force you to round it to either North or East. If you spin the map, your "rounded" answer changes randomly. In a molecule, this rounding error creates "ghost forces" that push the atoms in the wrong direction. Over time, the molecule might heat up, explode, or drift apart because the computer thinks the laws of physics have changed just because it turned the molecule.

The Solution: Geometric-Aware Quantization (GAQ)
The authors propose a new method called Geometric-Aware Quantization (GAQ). Instead of forcing the data into a rigid, square grid, they respect the natural shape of the data.

Here is how they do it, using three simple analogies:

1. The Compass and the Ruler (Magnitude-Direction Decoupling)

In a standard model, a vector (a direction with a strength) is treated as three numbers (X, Y, Z). If you compress X, Y, and Z separately, you distort the shape.

The authors say: "Let's separate the Ruler from the Compass."

The Ruler (Magnitude): How strong is the force? This is just a number. We can compress this easily, like rounding a price tag.
The Compass (Direction): Which way is it pointing? This is a point on a sphere (like the surface of a ball).
The Trick: Instead of compressing X, Y, and Z, they compress the strength with a ruler and the direction by snapping it to a pre-made, perfectly symmetrical map of the sphere (a "codebook"). This ensures that no matter how you spin the molecule, the direction snaps to the nearest valid spot on the sphere without breaking the symmetry.

2. The Specialized Training Camp (Branch-Separated Training)

Imagine a gym with two types of athletes: Distance Runners (who deal with simple numbers) and Gymnasts (who deal with complex 3D spins).

Standard training treats everyone the same.
The authors' method puts them in separate training camps. The "Distance Runners" get a standard, aggressive compression. The "Gymnasts" get a special, gentle training routine that respects their need to spin perfectly. This prevents the gymnasts from tripping over the compression errors.

3. The Stabilized Spotlight (Robust Attention)

In these networks, different parts of the molecule talk to each other using "attention" (like a spotlight shining on the most important atoms).

When you compress data, the "spotlight" can flicker or get too bright/dim, causing the whole system to go haywire.
The authors added a "dimmer switch" and a "stabilizer" to the spotlight. They normalize the brightness so that even with low-quality data, the spotlight stays steady and points exactly where it should, preventing the simulation from crashing.

The Results: Magic on a Budget
The results are impressive:

Physics Preserved: Unlike the "naive" method which caused molecules to explode in simulation, their method kept the molecules stable for a full nanosecond (a long time in physics terms) without breaking any laws of physics.
Better Accuracy: Surprisingly, their compressed model was actually more accurate than the full, heavy version. It's like how a sketch artist sometimes captures the "soul" of a face better than a hyper-realistic photo because they ignore the distracting noise. The compression acted as a filter, removing the "static" from the data.
Speed & Size: They made the model 4 times smaller and 2.4 times faster. This means scientists can now run these complex simulations on regular computers (like a gaming PC) instead of needing a supercomputer.

In Summary
This paper is about teaching computers to "squish" complex 3D physics models without crushing the laws of nature inside them. By respecting the geometry of the data (treating directions like points on a sphere rather than numbers on a grid), they managed to make powerful AI models smaller, faster, and more accurate, unlocking the ability to simulate the microscopic world on everyday hardware.

1. Problem Statement

Context: SO(3)-equivariant Graph Neural Networks (GNNs) are critical for physically consistent molecular simulations (e.g., predicting forces and energies) because they enforce rotational symmetry, which corresponds to the conservation of angular momentum via Noether's theorem.
The Challenge:

Computational Bottleneck: These models rely on high-order geometric tensor products and complex basis representations, leading to combinatorial growth in parameters and operations. They are often memory-bound ("memory wall") rather than compute-bound.
Failure of Naive Quantization: Standard low-bit quantization (e.g., INT8) treats feature channels as unstructured scalars. When applied to the vector components of equivariant GNNs, this discretization on a Cartesian grid destroys the algebraic relationships required by Wigner-D matrices.
Consequences: Naive quantization breaks SO(3) equivariance, leading to:
- Significant prediction errors (e.g., 4x increase in energy MAE).
- Violation of physical conservation laws (energy drift, spurious torques).
- Simulation instability (explosion in molecular dynamics trajectories).

2. Methodology: Geometric-Aware Quantization (GAQ)

The authors propose a framework designed to compress equivariant models while rigorously preserving continuous symmetry in discrete spaces. The core philosophy is to incorporate group-theoretic structure directly into the quantization process.

A. Magnitude–Direction Decoupled Quantization (MDDQ)

Instead of quantizing 3D vectors ( $v \in \mathbb{R}^3$ ) on a Cartesian grid, the method decomposes each vector into two components based on SO(3) representation theory:

Invariant Magnitude ( $m = \|v\|$ ): Quantized using standard scalar quantization.
Equivariant Direction ( $u = v/\|v\|$ ): Quantized on the unit sphere ( $S^2$ ) using a discrete spherical codebook.

Mechanism: The quantizer is defined as $Q(v) = Q_m(\|v\|) \cdot Q_d(v/\|v\|)$ .
Goal: By quantizing the direction on a spherical manifold rather than a Cartesian grid, the method minimizes the commutation error with rotations, preserving the geometric orientation even at low bit-widths.

B. Geometric Straight-Through Estimator (Geometric STE)

Standard STE assumes Euclidean geometry, which fails for unit vectors constrained to a sphere.

Problem: Naive backpropagation introduces radial noise (changing the magnitude of unit vectors), violating the constraint $\|u\|=1$ .
Solution: The authors project the Euclidean gradient onto the tangent space of the sphere:
$\frac{\partial L}{\partial u} = (I - uu^\top) \frac{\partial L}{\partial q}$
This ensures updates are strictly orthogonal to the vector, optimizing only the orientation (rotation) and maintaining manifold constraints.

C. Symmetry-Aware Branch-Separated Training

The network is split into invariant (scalar, $\ell=0$ ) and equivariant (vector, $\ell=1$ ) branches.

Strategy: Different quantization schedules and calibration are applied to each branch.
Warm-up: The equivariant branch's quantization is frozen for an initial period to allow the model to learn coarse geometric structures before introducing the non-convex optimization of the spherical codebook.

D. Robust Attention Normalization

To stabilize attention mechanisms in low-bit regimes:

Technique: Queries ( $q$ ) and Keys ( $k$ ) are L2-normalized to unit length.
Scaling: A temperature factor $\tau > 1$ is applied to the dot product logits to sharpen the attention distribution, preventing softmax saturation caused by quantization noise.
Benefit: Attention weights become dependent only on relative orientation (cosine similarity), not magnitude, ensuring stability under INT8 arithmetic.

E. Equivariance-Preserving Loss (LEE Regularization)

A regularization term based on Local Equivariance Error (LEE) is added to the training loss:
$L_{LEE} = \mathbb{E}_R [\| f(R \cdot G) - \rho(R) f(G) \|^2]$
This explicitly penalizes the model for failing to transform outputs correctly under random rotations, steering the quantized model toward bounded approximate equivariance.

3. Key Contributions

MDDQ Scheme: A novel quantization method that decouples magnitude and direction, enabling the preservation of SO(3) symmetry on a discrete spherical grid.
Geometric STE: A gradient estimator that respects Riemannian manifold constraints, preventing radial noise during the training of quantized vector fields.
Branch-Separated QAT: A training strategy that treats scalar and vector features differently, optimizing the trade-off between compression and symmetry preservation.
Robust Attention: A normalization mechanism tailored for equivariant transformers to stabilize low-bit inference.

4. Experimental Results

The framework was evaluated on the rMD17 benchmark (specifically Azobenzene) and compared against FP32 baselines and naive quantization methods.

Accuracy:
- Energy MAE: The proposed W4A8 model (4-bit weights, 8-bit activations) achieved 9.31 meV, outperforming the FP32 baseline (23.20 meV). The authors attribute this to quantization acting as a structural regularizer, filtering high-frequency noise.
- Force MAE: 22.60 meV/Å (comparable to FP32's 21.20 meV/Å).
Symmetry Preservation (LEE):
- Naive INT8: 5.23 meV/Å (High symmetry breaking).
- GAQ (W4A8): 0.15 meV/Å. This represents a >30x reduction in symmetry error compared to naive quantization, effectively restoring physical validity.
Molecular Dynamics Stability:
- In 1 ns NVE simulations, the Naive INT8 model caused energy divergence and simulation explosion within 100 ps.
- The GAQ model maintained stable energy conservation with a drift rate of < 0.15 meV/atom/ps, matching the FP32 baseline.
Efficiency:
- Memory: 4x reduction in model footprint.
- Speed: 2.39x inference speedup on consumer hardware (RTX 4090). The speedup is primarily driven by a 4x reduction in weight loading latency (memory-bound), validating the "memory wall" hypothesis.

5. Significance

Theoretical Breakthrough: The paper demonstrates that continuous symmetries can be preserved in discrete, low-bit neural networks through geometrically aware design, rather than being an inherent limitation of quantization.
Practical Impact: It enables the deployment of high-accuracy, physics-consistent equivariant GNNs on resource-constrained hardware (e.g., edge devices) and allows for longer timescale molecular dynamics simulations that were previously computationally prohibitive.
Paradigm Shift: It suggests that quantization is not merely a compression tool but a regularization mechanism that can improve model generalization and physical consistency by filtering noise, provided the geometric structure is respected.
Future Directions: The framework opens pathways for exploring higher-order irreducible representations ( $\ell \ge 2$ ) and integrating with symplectic integrators for even more faithful physical simulations.

Preserving Continuous Symmetry in Discrete Spaces: Geometric-Aware Quantization for SO(3)-Equivariant GNNs

1. The Compass and the Ruler (Magnitude-Direction Decoupling)

2. The Specialized Training Camp (Branch-Separated Training)

3. The Stabilized Spotlight (Robust Attention)

1. Problem Statement

2. Methodology: Geometric-Aware Quantization (GAQ)

A. Magnitude–Direction Decoupled Quantization (MDDQ)

B. Geometric Straight-Through Estimator (Geometric STE)

C. Symmetry-Aware Branch-Separated Training

D. Robust Attention Normalization

E. Equivariance-Preserving Loss (LEE Regularization)

3. Key Contributions

4. Experimental Results

5. Significance

More like this

A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset

Probabilistic Language Tries: A Unified Framework for Compression, Decision Policies, and Execution Reuse

FLeX: Fourier-based Low-rank EXpansion for multilingual transfer

Spectral Edge Dynamics Reveal Functional Modes of Learning

S3S^3S3: Stratified Scaling Search for Test-Time in Diffusion Language Models

$S^3$ : Stratified Scaling Search for Test-Time in Diffusion Language Models