DreamSAC: Learning Hamiltonian World Models via Symmetry Exploration

Imagine you are trying to teach a robot how to play pool.

The Old Way (Passive Learning):
Most current AI robots learn by watching thousands of hours of pool videos. They are like a student who memorizes every single shot they've ever seen. If the robot sees a red ball hit a blue ball, it knows exactly what happens because it's seen that specific scene a million times.

But here's the problem: If you put a heavier blue ball on the table, or change the friction of the felt, or move the camera to a weird angle, the robot gets confused. It fails. Why? Because it didn't learn the rules of physics (like momentum or gravity); it just learned to recognize patterns in the pixels. It's like memorizing the answer key to a test without understanding the math.

The New Way (DreamSAC):
The paper introduces DreamSAC, a robot that doesn't just watch; it plays. It treats the world like a playground and uses a special "curiosity" to figure out the laws of physics.

Here is how it works, broken down into three simple steps:

1. The "Physics Detective" (Symmetry Exploration)

Instead of waiting for data, the robot actively tries to break things.

The Analogy: Imagine a child in a dark room. A passive learner sits still and waits for a light to turn on. A DreamSAC robot is like a child who starts throwing balls at the walls, jumping on furniture, and banging pots.
Why? The robot has a special "curiosity bonus." It gets a reward for doing things that cause a big change in energy. It wants to find out: "If I push this heavy box, how much harder do I have to push compared to a light box?"
The Result: By actively "breaking symmetry" (doing things that change the system), it gathers the exact kind of data needed to understand the underlying rules, not just the surface appearance.

2. The "Invisible Backpack" (Hamiltonian World Model)

Once the robot gathers this data, it builds a mental model of the world. But this isn't a normal model; it's built on Hamiltonian Physics.

The Analogy: Think of a normal AI model as a video game character that just remembers the map. DreamSAC's model is like a character who carries an invisible backpack filled with the laws of physics (conservation of energy, momentum, etc.).
The Magic: Even if the robot looks at the pool table from a weird angle (a new camera view), the "backpack" tells it, "Hey, the ball still has mass, and gravity still pulls down." It separates the visual noise (is the camera tilted? is it sunny?) from the physical truth (how heavy is the ball?).
The Contrastive Trick: To make sure the robot ignores the camera angle, the researchers use a "spot the difference" game. They show the robot two pictures of the same scene from different angles and say, "Ignore the angle; tell me what's the same about the physics." This forces the robot to learn the invariant (unchanging) laws.

3. The "Dreaming" Phase (Imagination)

The robot doesn't just learn from real life; it dreams.

The Analogy: After playing in the real world, the robot goes to sleep and runs simulations in its head. It imagines, "What if I hit the ball with double the force?" or "What if gravity was 50% stronger?"
The Benefit: Because its "backpack" contains the actual laws of physics, these dreams are accurate. It can predict what happens in a world it has never visited before.

Why This Matters

The paper shows that DreamSAC is a chameleon of physics.

If you change the gravity, it adapts instantly.
If you change the camera angle, it doesn't get confused.
If you add a new object, it knows how to interact with it.

In a nutshell:
Old AI is like a parrot that repeats what it hears. DreamSAC is like a curious scientist who pokes, prods, and experiments until it figures out why things happen. It doesn't just memorize the world; it learns the code that runs the universe, allowing it to handle situations it has never seen before.

Here is a detailed technical summary of the paper "DreamSAC: Learning Hamiltonian World Models via Symmetry Exploration."

1. Problem Statement

Current state-of-the-art world models (e.g., DreamerV3) excel at interpolative generalization—predicting outcomes for scenarios similar to training data by learning statistical correlations in pixel sequences. However, they fail at extrapolative generalization, where agents encounter novel physical parameters (e.g., different gravity, friction, mass ratios) or unseen viewpoints.

The core limitation is that these models learn spurious statistical correlations rather than the underlying generative rules (physical laws) of the environment. They treat pixels as static patterns without understanding concepts like force, momentum, or energy conservation. Consequently, they cannot robustly adapt to Out-of-Distribution (OOD) physical conditions.

2. Methodology: DreamSAC Framework

The authors propose DreamSAC (Dream with Symmetry-Aware Curiosity), a framework that shifts learning from passive statistical fitting to active physical discovery. It consists of two tightly coupled components:

A. Hamiltonian World Model ( $H_\phi$ )

Instead of a standard black-box dynamics predictor (like an RSSM), DreamSAC employs a physics-grounded model based on Controlled Hamiltonian Dynamics.

Latent Representation: The model uses an object-centric encoder (based on SAVi) to map raw pixels to latent object slots $Z_t = \{z_t^i\}$ . Each slot is structurally decomposed into generalized coordinates ( $q_t$ ) and canonical momenta ( $p_t$ ).
Symmetry Constraints: The internal dynamics are governed by a Hamiltonian function $H_\phi(Z_t)$ that is constrained to be invariant under the relevant physical symmetry group (e.g., $SE(3)$ for 3D rigid bodies). This ensures the learned energy function depends only on physical state, not camera viewpoint.
Viewpoint Robustness: To resolve the conflict between reconstruction (which needs viewpoint details) and physical invariance (which needs to discard them), DreamSAC introduces a Viewpoint-Robustness Loss ( $L_{vr}$ ). This is a self-supervised contrastive loss that forces the encoder to produce identical latent representations for different augmented views of the same physical state, effectively "cleaning" the latent space of nuisance variables.
Dynamics Integration: The model uses a Symplectic Integrator (Leapfrog) during inference to ensure long-term energy conservation, while using a standard Euler integrator during training for gradient stability.

B. Symmetry Exploration Policy

To learn the Hamiltonian $H_\phi$ effectively, the agent cannot passively observe; it must actively probe the system's symmetries.

Intrinsic Reward (Symmetry-Aware Curiosity): The agent is motivated by a reward $r_{sym} \approx |\Delta H_\phi| = |H_\phi(Z_{t+1}) - H_\phi(Z_t)|$ $r_{sy m} \approx ∣Δ H_{ϕ} ∣ = ∣ H_{ϕ} (Z_{t + 1}) - H_{ϕ} (Z_{t}) ∣$ .
- Theoretical Basis: According to Noether's theorem, symmetries imply conservation laws. In a closed system, $\Delta H \approx 0$ . To learn the structure of $H$ , the agent must perform actions that do work on the system, breaking the conservation temporarily to reveal the system's response (stiffness, mass, potential barriers).
- Annealing Strategy: Since $H_\phi$ is untrained initially, $r_{sym}$ is noisy. The authors use a linear annealing schedule, starting with a standard novelty bonus (RND) to bootstrap exploration, then gradually shifting to the physics-based $r_{sym}$ as the model improves.
Process: The agent trains entirely in "imagination" (using the world model) to maximize this curiosity reward, then executes the policy in the real environment to collect physically informative data, which refines the world model.

3. Key Contributions

Symmetry Exploration: A novel unsupervised exploration strategy that uses a Hamiltonian-based curiosity signal to actively seek interactions that maximize work (energy change), thereby collecting data specifically designed to reveal physical invariances.
Hamiltonian World Model with Contrastive Learning: A world model that enforces physical symmetries via a Lie Transformer architecture and uses self-supervised contrastive learning to disentangle viewpoint-dependent visual features from viewpoint-independent physical states.
Differentiated Fine-Tuning: A rapid adaptation mechanism where, for new tasks, the viewpoint-robust encoder is frozen, and only the Hamiltonian parameters are fine-tuned. This allows for fast system identification of new physical parameters (e.g., friction, gravity) without relearning the underlying physics.

4. Experimental Results

The framework was evaluated on 3D physics benchmarks (DeepMind Control Suite and GymFetch) against strong baselines like DreamerV3 and RND.

Predictive Accuracy: DreamSAC achieved significantly lower Mean Squared Error (MSE) in image prediction compared to baselines across various rollout horizons, demonstrating a more accurate internal dynamics model.
Extrapolative Generalization (OOD):
- Structural OOD: DreamSAC outperformed baselines on tasks with unseen viewpoints, object counts, and goal positions.
- Parametric OOD: The model showed superior adaptation to unseen physical parameters (e.g., 1.5x gravity, 2.0x friction). In "Unseen Distribution" tasks (domain shift), DreamSAC significantly outperformed domain-randomized baselines, proving it learned the functional form of the dynamics rather than just memorizing the training distribution.
Ablation Studies: Removing the Viewpoint-Robustness Loss ( $L_{vr}$ ) or the Hamiltonian prior ( $H_\phi$ ) caused significant performance drops, confirming that both components are essential for robust extrapolation.
Qualitative Analysis: Visualizations confirmed that the learned Hamiltonian remains constant during zero-action rollouts (validating energy conservation) and that latent states for novel physical parameters form distinct clusters separate from training data.

5. Significance

DreamSAC represents a paradigm shift from passive statistical learning to active physical discovery.

Robustness: It addresses the critical "noisy TV" problem of curiosity-based RL by focusing on physically informative novelty (energy changes) rather than just statistical unpredictability.
Sample Efficiency: By learning the underlying generative rules (symmetries), the agent requires fewer interactions to adapt to new physical environments compared to models that must relearn correlations from scratch.
Generalization: It demonstrates that embedding physical priors (Hamiltonian mechanics) and enforcing invariance through contrastive learning enables agents to generalize to scenarios they have never seen, a crucial step toward deploying RL agents in the unpredictable real world.

In summary, DreamSAC proves that explicitly modeling physical invariances and actively exploring to challenge those invariances leads to world models that are not just descriptive, but grounded in physical reality, enabling robust extrapolation.

DreamSAC: Learning Hamiltonian World Models via Symmetry Exploration

1. The "Physics Detective" (Symmetry Exploration)

2. The "Invisible Backpack" (Hamiltonian World Model)

3. The "Dreaming" Phase (Imagination)

Why This Matters

1. Problem Statement

2. Methodology: DreamSAC Framework

A. Hamiltonian World Model (HϕH_\phiHϕ​)

B. Symmetry Exploration Policy

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Comparison of Outlier Detection Algorithms on String Data

Structure-Aware Epistemic Uncertainty Quantification for Neural Operator PDE Surrogates

Interventional Time Series Priors for Causal Foundation Models

Fingerprinting Concepts in Data Streams with Supervised and Unsupervised Meta-Information

Graph Tokenization for Bridging Graphs and Transformers

A. Hamiltonian World Model ( $H_\phi$ )