Learning Permutation-invariant Macroscopic Dynamics

The Big Problem: The "Unordered Crowd"

Imagine you are trying to understand the mood of a massive crowd of people at a concert. You want to predict how the crowd will move or react over time (the macroscopic dynamics).

Usually, scientists try to do this by taking a snapshot of every single person, listing them in a specific order (Person 1, Person 2, Person 3...), and feeding that list into a computer model. This works fine if the people are sitting in numbered seats.

But in many real-world systems—like gas molecules bouncing around, or particles in a fluid—there are no seats. The particles are a jumbled, unordered set. If you swap Person 1 and Person 2 in your list, the physical reality hasn't changed at all. However, traditional computer models get confused by this. They think, "Oh, the list changed, so the crowd must be different!" This causes them to fail when the order of the data changes.

The Old Solution vs. The New Idea

The Old Way (The "Point-by-Point" Approach):
Imagine trying to describe a crowd by saying, "Person 1 is at the left, Person 2 is at the right." If you shuffle the crowd, you have to rewrite the whole description. If you try to teach a computer to learn from this, it struggles because it doesn't know which "Person 1" in the new photo matches "Person 1" in the old photo. It's like trying to match socks from two different piles without looking at the patterns, just the order they were picked up.

The New Way (The "Cloud" Approach):
This paper proposes a clever shortcut. Instead of trying to match every single person (or particle) one-by-one, the authors suggest looking at the shape of the crowd.

Imagine the crowd isn't a list of people, but a fog or a cloud of dust.

Where there are many people, the fog is thick.
Where there are few people, the fog is thin.

If you shuffle the people around, the shape of the fog might change slightly, but the overall "cloud" remains the same. You don't need to know who is who; you just need to know where the density is.

How Their Method Works

The authors built a special "Autoencoder" (a type of AI that compresses information and then tries to rebuild it) that works with this "fog" idea.

The Encoder (The Photographer):
Instead of taking a photo of individual people, the encoder looks at the whole unordered set of particles and creates a single, compact summary (a "latent variable"). Crucially, this summary is permutation-invariant. It doesn't matter if you shuffle the input; the summary stays the same because it only cares about the overall distribution, not the order.
The Decoder (The Fog Maker):
This is the tricky part. Usually, an AI tries to rebuild the exact list of people. But since the order is unknown, that's impossible.
Instead, this decoder tries to rebuild the fog. It takes the summary and generates a smooth density map (a "cloud") that looks like the original particle distribution. It asks, "If I spread this summary out, does it look like the original cloud of particles?"
Learning the Future:
Once the AI learns to compress the crowd into a summary and rebuild the cloud, it also learns how that summary changes over time. It predicts how the "fog" will evolve, allowing scientists to predict the future behavior of the system without tracking every single particle.

Why This Matters (The Results)

The paper tested this method on three different scenarios:

Interacting Particles: They simulated particles pushing and pulling each other. The new method predicted the system's energy changes much better than old methods, even when they changed the number of particles or shuffled their starting positions.
Mixing Fluids: They simulated two types of fluids (like oil and water) mixing together. The method accurately predicted how fast they would mix, even when the starting boundary was in a different place than what it saw during training.
Polymer Videos: They even applied this to video data of long chain molecules (polymers) stretching. They treated every pixel in the video as a "particle." The method successfully learned how the chains would stretch, proving it works even when the "particles" are just pixels in an image.

The Bottom Line

This paper solves a headache for scientists: How do you model a system where the parts have no names or numbers?

By stopping the attempt to match individual parts and instead focusing on matching the overall shape and density of the system, they created a robust tool. It's like learning to predict the weather by looking at the pressure map (the cloud) rather than trying to track every single water molecule. This allows for accurate predictions of complex systems, regardless of how the data is ordered or how many particles are involved.

Technical Summary: Learning Permutation-invariant Macroscopic Dynamics

1. Problem Statement

Accurately modeling the macroscopic dynamics of high-dimensional microscopic systems is a central challenge in multiscale science. Many physical systems, such as interacting particle systems or fluids, consist of microscopic degrees of freedom (e.g., particle positions) that are inherently unordered. Existing data-driven approaches for closure modeling—which aim to learn low-dimensional latent variables (closure variables) that encode microscopic information to predict macroscopic evolution—typically rely on autoencoders trained with pointwise reconstruction losses.

These standard methods assume a fixed ordering of input data (represented as vectors or tensors), utilizing architectures like Multilayer Perceptrons (MLPs) or Convolutional Neural Networks (CNNs). However, this assumption fails for unordered sets where the physical state is invariant to particle permutation. Applying ordered models to unordered data requires artificial canonical ordering or permutation augmentation, which can be computationally prohibitive or lead to optimization instability. Furthermore, reconstructing unordered sets via pointwise losses (e.g., Mean Squared Error) requires explicit matching between input and output permutations, a problem that scales factorially ( $N!$ ) and often necessitates expensive combinatorial matching or permutation-invariant distance metrics (e.g., Chamfer distance, Earth Mover's distance).

2. Methodology

The authors propose a novel autoencoder framework designed to learn permutation-invariant latent representations without requiring explicit point-to-point alignment. The core innovation lies in shifting the reconstruction objective from individual particles to the distribution of particles.

Architecture Overview:

Encoder ( $\hat{\phi}$ ): A permutation-invariant set encoder maps the unordered microstate $X = \{x_1, \dots, x_n\}$ to a low-dimensional latent vector $\hat{z}$ . The authors instantiate this using DeepSet, which aggregates particle features via a symmetric function (e.g., sum or mean pooling), ensuring $\hat{\phi}(\sigma X) = \hat{\phi}(X)$ for any permutation $\sigma$ .
Target Distribution Induction: Instead of treating the input as a vector, the method induces a continuous target density $q_X(x)$ over the input space. This density is a mixture of isotropic Gaussian kernels centered at the observed particle positions:
$q_X(x) = \frac{1}{|X|} \sum_{x_i \in X} \delta_\epsilon(x - x_i)$
where $\epsilon$ acts as a smoothing bandwidth, controlling the resolution of the representation.
Decoder ( $\psi$ ): The decoder is a conditional density model (implemented as a conditional normalizing flow) that generates a probability density $p_\theta(x|\hat{z})$ conditioned on the latent variable $\hat{z}$ .
Training Objective: The model is trained to minimize the Kullback-Leibler (KL) divergence between the target density and the generated density:
$\mathcal{L}_{rec} = \mathbb{E}_X [\text{KL}(q_X(x) \parallel p_\theta(x|\hat{z}))]$
This objective is inherently permutation-invariant because the KL divergence between two densities does not depend on the ordering of the samples used to estimate them.

Macroscopic Dynamics Modeling:
The learned latent variable $\hat{z}$ is concatenated with pre-defined macroscopic observables $\bar{z}$ (e.g., system energy) to form an augmented state $z_t = [\bar{z}_t, \hat{z}_t]$ . A dynamical model (parameterized by MLPs) is then trained to predict the evolution of $z_t$ using an Euler–Maruyama discretization of a Stochastic Differential Equation (SDE) or Ordinary Differential Equation (ODE), minimizing the negative log-likelihood of one-step transitions.

3. Key Contributions

Distributional Reconstruction Strategy: The paper introduces a reconstruction objective that learns closure variables by matching probability densities rather than pointwise coordinates. This eliminates the need for explicit set matching and naturally enforces permutation invariance.
Variable-Size Input Handling: The architecture supports inputs of varying particle counts ( $n$ ), as the encoder processes particles independently and the decoder operates on the induced density, which is independent of the specific number of particles during the Monte Carlo sampling phase.
Computational Efficiency: Unlike pointwise matching methods that scale poorly with $N$ , the proposed method scales linearly with the number of particles for the encoder ( $O(N)$ ) and is independent of $N$ for the decoder's reconstruction loss evaluation (dependent only on the number of Monte Carlo samples).
Joint Learning Framework: The method jointly learns the permutation-invariant latent states and the macroscopic dynamics, demonstrating that reconstruction-based objectives effectively regularize the latent space for dynamical prediction.

4. Experimental Results

The authors evaluate the method across three distinct microscopic settings:

Interacting Particle Systems (Deterministic Energy Dynamics):
- Task: Predicting the normalized pairwise interaction energy of 2D particles evolving under a step-force law.
- Results: The proposed method achieved the lowest Mean Relative Error (MRE) in in-distribution tests and demonstrated superior generalization to different initial patterns and varying particle counts (400 particles vs. 300 in training). Baselines using standard autoencoders with permutation augmentation (AE-Aug) failed to maintain permutation invariance, producing different predictions for the same physical state under different orderings.
Binary Particle Mixing (Stochastic Lennard-Jones Fluids):
- Task: Predicting the mixing ratio (short-range order) of two particle species in a 2D domain.
- Results: Evaluated using Maximum Mean Discrepancy (MMD) for stochastic dynamics. The proposed method outperformed all baselines (including those using Chamfer distance) across in-distribution, different initial separations, and reduced system sizes. The study highlighted that direct training of dynamics without reconstruction (InvE) led to representation collapse and poor performance, validating the necessity of the reconstruction objective.
Polymer Extension (Video/Image Data):
- Task: Modeling the stretching dynamics of polymer chains from video data, treating non-white pixels as particles.
- Results: The method successfully captured stretching dynamics for fast and medium extension rates. It showed comparable performance to state-of-the-art image models (CNNs, Vision Transformers) but struggled with slow extension rates where initial configurations were visually similar to fast cases, suggesting limitations in distinguishing microstates with subtle differences.

5. Significance and Claims

The paper claims that the proposed framework addresses a fundamental gap in closure modeling for unordered physical systems. By reconstructing the distributional information rather than individual points, the method achieves true permutation invariance and handles variable system sizes without the computational overhead of combinatorial matching.

The authors position this work as a robust alternative to existing autoencoder-based closure modeling, particularly for particle-based systems where canonical ordering is absent. They note that while the method is effective for systems where macroscopic evolution corresponds to significant changes in microscopic configurations, it may face challenges in "stiff" systems where small microscopic perturbations yield large macroscopic changes, or where microstate distributions are nearly indistinguishable. The paper concludes that this approach offers a promising path for improving scientific surrogate models and accelerating exploratory simulations in multiscale domains.