Rethinking Diffusion Models with Symmetries through… — Plain-Language Explanation

Original authors: Cai Zhou, Zijie Chen, Zian Li, Jike Wang, Kaiyi Jiang, Pan Li, Rose Yu, Muhan Zhang, Stephen Bates, Tommi Jaakkola

Published 2026-02-17

📖 4 min read☕ Coffee break read

View on arXiv ↗PDF ↗

CC BY 4.0

Original authors: Cai Zhou, Zijie Chen, Zian Li, Jike Wang, Kaiyi Jiang, Pan Li, Rose Yu, Muhan Zhang, Stephen Bates, Tommi Jaakkola

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to teach a robot to draw molecules. A molecule is just a cluster of atoms connected by bonds. But here's the tricky part: a molecule doesn't care about its name tag or which way it's facing.

If you rotate a molecule 90 degrees, or if you swap the names of two identical carbon atoms, it's still the exact same molecule. In math and physics, this is called symmetry.

The Old Way: The "Strict Teacher"

For a long time, scientists tried to teach the robot by forcing it to be "symmetry-aware" from the start. They built the robot's brain (the neural network) with special rules that said, "No matter how you turn this, you must treat it the same."

The Problem: This is like trying to teach a student to solve a math problem while wearing blinders that force them to look at the numbers from every angle at once.

It's computationally heavy (the robot gets tired).
It's confusing. If the robot sees a noisy, blurry version of a molecule, it doesn't know which rotation or which atom order is the "correct" one to start fixing. It's like trying to find a needle in a haystack where every needle looks exactly the same, just rotated differently. The robot gets stuck in a loop of confusion, trying to learn all these duplicate versions at once.

The New Way: The "Canonicalization" Strategy

This paper proposes a clever shortcut. Instead of forcing the robot to be a symmetry expert, we give it a standardized pose first.

Think of it like this:

The Problem: Imagine you have a pile of 1,000 photos of the same person, but some are upside down, some are sideways, and some have the person's left and right arms swapped. If you try to teach an AI to recognize "John" from this messy pile, it's hard.
The Solution (Canonicalization): Before you show the photos to the AI, you run them through a "photo editor" that automatically:
- Rotates every photo so the person is standing upright.
- Labels the left arm "Left" and the right arm "Right" for everyone.
- Now, every photo of John looks exactly the same.
The Training: You teach the AI to fix blurry photos using this standardized pile. Because every photo is in the same orientation, the AI learns much faster and more accurately. It doesn't have to waste brainpower figuring out "is this upside down?"
The Result: Once the AI is trained, you can generate a new, perfect photo of John. But since the real world allows people to stand in any direction, you take the AI's output and randomly rotate it to make it look natural again.

The "Molecular" Analogy

In the world of molecules, the authors do the same thing:

The "Photo Editor": They use a mathematical trick (based on the molecule's shape and connections) to pick one specific "canonical" order for the atoms and one specific "canonical" direction for the molecule.
The Training: They train the diffusion model (the AI that generates molecules) only on these standardized versions.
The Magic: Because the AI isn't confused by symmetry anymore, it learns the "shape" of the molecule much faster. It can generate high-quality 3D molecules in fewer steps and with less computing power.

Why is this a big deal?

Speed: The AI learns faster because it's not fighting against the confusion of symmetry.
Simplicity: You don't need to build a super-complex, "symmetry-hardwired" robot. You can use a standard, powerful robot and just give it standardized inputs.
Quality: The paper shows that this method creates better, more stable molecules than the old "strict teacher" methods, especially when you need to generate them quickly (in just a few steps).

The "Canon" Architecture

The authors also built a new tool called CanonFlow. Think of it as a specialized workshop where the robot not only sees the standardized molecule but also has a "name tag" for every atom that tells it exactly where it belongs in the lineup. This extra hint helps the robot make even fewer mistakes.

Summary

The paper argues that instead of forcing AI to be a symmetry expert, we should just standardize the data first, let the AI learn the easy version, and then randomize the result at the end. It's like teaching someone to drive by starting in an empty, straight parking lot (standardized), and then letting them drive on the winding, chaotic roads of the real world (randomized) once they've mastered the basics.

Result: Faster training, better molecules, and less computing power wasted on confusion.

1. Problem Statement

Generative modeling in chemistry and science often deals with data distributions that are invariant to group symmetries, specifically permutations ( $S_N$ ) of atoms and Euclidean motions ($SE(3)$, including rotation and translation).

The Challenge: Traditional approaches enforce these symmetries by building equivariant architectures (e.g., E(3)-equivariant neural networks) and invariant priors. While principled, this imposes significant architectural constraints and computational overhead.
The Core Issue: Symmetry creates latent "gauge" ambiguity. In intermediate noisy states of a diffusion process, a single noise vector can correspond to multiple equivalent configurations (different rotations or atom orderings). This results in a mixture-like distribution where the learned score function or velocity field must average over these possibilities.
- This leads to "trajectory crossing" and conflicting gradients.
- It inflates the conditional variance in flow matching, making the learning dynamics complex and requiring more steps to converge.
- It forces models to learn complex averaging mechanisms rather than direct transport.

2. Methodology: Canonical Diffusion

The authors propose a paradigm shift: instead of enforcing equivariance in the model architecture, they break symmetry during training via canonicalization and restore invariance only at generation time.

A. Theoretical Framework

The paper establishes a formal theory based on quotient spaces:

Canonicalization Map ( $\Psi$ ): Maps every molecule in an orbit (all symmetric variations) to a unique canonical representative (a specific pose and atom ordering). This defines a "slice" of the space.
Training on the Slice: The diffusion or flow model is trained on this canonical slice using unconstrained (non-equivariant) backbones (e.g., standard Transformers or GNNs).
- Variance Decomposition: The authors prove that the irreducible error in flow matching decomposes into two terms:
  - Within-slice difficulty: The actual transport difficulty on the canonical slice.
  - Symmetry ambiguity: Variance caused by not knowing which group element generated the data.
- Key Insight: Canonicalization eliminates the symmetry ambiguity term, significantly reducing the conditional variance and simplifying the learning task.
Invariance Recovery: At inference, the model generates a sample on the canonical slice, and a random symmetry transform (sampled from the Haar measure) is applied to the output to restore the invariant distribution.

B. Practical Implementation for Molecules

The framework is instantiated for molecular graph generation under $S_N \times SE(3)$ symmetries:

Geometric Spectral Canonicalization:
- Permutation ( $S_N$ ): Uses the Fiedler vector (eigenvector of the second smallest eigenvalue) of a geometric Laplacian constructed from 3D coordinates and bond distances. This provides a stable, geometry-aware atom ordering (core-to-periphery).
- Rotation ($SO(3)$): Defines a canonical frame using anchor atoms (e.g., extremes of the Fiedler ordering) to align the molecule to a fixed coordinate system.
Architecture (Canon & CanonFlow):
- Canon Architecture: A novel architecture that explicitly incorporates canonical rank as a learnable hidden state interacting with atom features in every layer. This allows the model to leverage the fixed ordering without needing equivariant layers.
- Positional Encodings: Uses normalized canonical ranks as positional encodings to break permutation equivariance.
Training Enhancements:
- Aligned Priors: Instead of an isotropic Gaussian prior, they use a prior aligned with the canonical slice geometry (e.g., moment-matched Gaussian) to reduce within-slice transport difficulty.
- Optimal Transport (OT) Annealing: Uses OT coupling early in training to straighten trajectories but anneals it to allow generalization.
- Projected Canonical Sampling (PCS): During inference, intermediate states can be re-projected to the canonical slice to maintain consistency with the rank conditions.

3. Key Contributions

Theoretical Proof of Superiority: Proves that canonicalized generative models are universal for invariant targets and possess superior expressivity compared to equivariant models when using non-equivariant backbones. They demonstrate that canonicalization removes the "symmetry ambiguity" term in the conditional variance lower bound, accelerating training.
Novel Framework: Introduces Canonical Diffusion, a framework that replaces complex equivariant architectures with simple non-equivariant models trained on a canonical slice, followed by randomization.
Complementarity of OT and Canonicalization: Shows that while Optimal Transport reduces trajectory intersections, it is non-unique under symmetry. Canonicalization stabilizes OT by fixing a gauge, making the two techniques complementary.
New Architecture (Canon): Designs a specific architecture that integrates canonical rank information directly into the message-passing layers, enabling high-performance generation without equivariant constraints.

4. Experimental Results

The method was evaluated on QM9 and GEOM-DRUG (a challenging dataset of drug-like molecules).

State-of-the-Art Performance:
- CanonFlow (using the new architecture) achieved SOTA on Molecule Stability (98.4%) and Validity (95.9%) on GEOM-DRUG, outperforming strong baselines like SemlaFlow and EQGAT-diff by large margins.
- On QM9, it achieved the lowest Opt-RMSD (0.17 Å), indicating generated geometries are closer to energy-minimized conformations.
Few-Step Generation:
- Canonicalized models significantly outperform baselines in few-step sampling (e.g., 50 steps). They maintain high stability and validity with far fewer function evaluations (NFE), demonstrating the "straighter" learned transport dynamics.
Efficiency:
- Training convergence is faster due to reduced conditional variance.
- Computational overhead is negligible compared to equivariant baselines, as it utilizes standard Transformer/GNN backbones rather than expensive tensor algebra.

5. Significance

This work fundamentally challenges the dogma that generative models for symmetric data must be equivariant.

Paradigm Shift: It demonstrates that breaking symmetry (via canonicalization) during training is often more efficient and expressive than enforcing it.
Scalability: By allowing the use of generic, powerful non-equivariant architectures (like Transformers), this approach scales better to large molecular datasets than specialized equivariant layers.
General Applicability: While focused on molecules, the theory of canonical diffusion applies to any domain with group symmetries (e.g., point clouds, sets), offering a new theoretical lens for understanding and improving diffusion/flow models.

In summary, the paper provides a rigorous theoretical justification and a practical, high-performing framework that simplifies the learning of symmetric distributions by mapping them to a canonical space, thereby removing the "noise" of symmetry ambiguity from the learning process.

Rethinking Diffusion Models with Symmetries through Canonicalization with Applications to Molecular Graph Generation