Learning of Population Dynamics: Inverse Optimization Meets JKO Scheme

Imagine you are a detective trying to figure out how a crowd of people moves through a city, but you have a major problem: you can't watch them move.

You only have a series of "snapshots" (photos) taken at different times.

Photo 1: People are gathered in a park.
Photo 2: People are scattered near a coffee shop.
Photo 3: People are clustered around a bus stop.

You don't know who went where. You don't know if the person in the red shirt in Photo 1 is the same person in the red shirt in Photo 3. You just see the shape of the crowd changing over time.

Your goal is to figure out the invisible rules (the "energy") that are pushing and pulling these people. Is there a magnet pulling them to the coffee shop? Is there a wind blowing them toward the bus? Is there a natural tendency for them to spread out?

This is the problem of Learning Population Dynamics.

The Old Way: The "Perfect Map" Problem

Previous methods tried to solve this by assuming they could draw a perfect, continuous map connecting every single person from Photo 1 to Photo 2.

The Flaw: In the real world (like studying cells in a lab or stock prices), you often destroy the sample to measure it, or you can't track individuals. You can't draw that perfect map.
The Result: Old methods were either too slow, required impossible assumptions (like "everyone must move in a straight line"), or needed you to pre-calculate the map before you could even start learning the rules.

The New Way: iJKOnet (The "Smart Guessing Game")

The authors of this paper introduce a new method called iJKOnet. Think of it as a high-stakes game of "Hot and Cold" played between two AI agents.

The Players

The Rule-Maker (The Energy Function): This AI tries to invent a set of invisible rules (like "gravity" or "attraction") that explain why the crowd moves the way it does.
The Map-Maker (The Transport Map): This AI tries to figure out the most efficient way to move the crowd from the "Park" shape to the "Coffee Shop" shape according to the rules invented by the Rule-Maker.

The Game Loop

The Setup: The Rule-Maker says, "I think the crowd moves because they are attracted to coffee."
The Test: The Map-Maker tries to move the crowd based on that rule. It pushes the people from the Park toward the Coffee Shop.
The Reality Check: The system compares the Map-Maker's result with the actual Photo 2 (the real Coffee Shop crowd).
- If they match: Great! The Rule-Maker's guess was good.
- If they don't match: The system says, "Nope, your rules are wrong. The crowd didn't move like that."
The Twist (Inverse Optimization): Here is the clever part. Instead of just fixing the map, the system punishes the Rule-Maker. It forces the Rule-Maker to change its rules until the "cost" of moving the crowd matches the reality perfectly.

It's like trying to guess the recipe for a soup.

You taste the soup (the real data).
You guess the ingredients (the rules).
You cook a batch based on your guess.
If the taste is wrong, you don't just tweak the cooking; you realize your recipe was wrong and rewrite it entirely.

Why is this a Big Deal?

1. No "Input-Convex" Headaches
Old methods forced the AI to use very specific, rigid types of neural networks (like forcing a chef to only use square pans). This paper says, "Nope, use whatever tools you want." This makes the AI much more flexible and powerful.

2. It Works Without Tracking
Because it looks at the shapes of the crowds rather than individual people, it works perfectly for situations where you can't track individuals, like:

Biology: Watching how stem cells turn into different tissues (you kill the cell to look at it, so you can't watch it grow).
Finance: Predicting how stock prices move based on the distribution of prices at different times.
Traffic: Figuring out how pedestrians flow through a station without tracking every single person.

3. It's End-to-End
Old methods were like building a car engine, then building a transmission, then trying to bolt them together. If they didn't fit, you had to start over. This new method builds the engine and transmission at the same time, learning from each other instantly.

The Bottom Line

iJKOnet is a new detective tool. It looks at a series of blurry, disconnected photos of a moving crowd and figures out the invisible laws of physics that are driving them. It does this by playing a smart guessing game where it learns from its mistakes, without needing to know the identity of every single person in the crowd.

This allows scientists to understand complex systems—from how diseases spread to how cells develop—using data that was previously too messy or incomplete to analyze.

1. Problem Statement

The paper addresses the challenge of learning population dynamics from discrete, temporally separated snapshots of population distributions.

Context: In many domains (biology, ecology, finance, epidemiology), individual particle trajectories are unobservable due to destructive sampling (e.g., single-cell genomics) or data limitations. Instead, only marginal distributions $\{\rho_k\}_{k=0}^K$ at discrete time points are available.
Goal: Recover the underlying energy functional $J^*$ that governs the evolution of the system. The system is modeled as a Wasserstein Gradient Flow (WGF), where the continuous evolution is approximated by the Jordan-Kinderlehrer-Otto (JKO) scheme.
The Inverse Problem: Given observed distributions $\rho_k$ and $\rho_{k+1}$ , find the energy functional $J$ such that $\rho_{k+1} \approx \text{JKO}_\tau^J(\rho_k)$ .

2. Methodology: iJKOnet

The authors propose iJKOnet, a novel framework that combines the JKO scheme with inverse optimization techniques.

Core Concept: Inverse Optimization Gap

Instead of solving the forward JKO optimization problem to predict the next state (as in standard generative modeling), iJKOnet treats the recovery of the energy functional as an inverse problem.

The JKO Step: For a true energy $J^*$ , the next distribution $\rho_{k+1}$ minimizes the JKO functional:
$\rho_{k+1} = \arg\min_{\rho} \left( J^*(\rho) + \frac{1}{2\tau} d_{W_2}^2(\rho, \rho_k) \right)$
The Inequality: If a candidate functional $J$ is not the true $J^*$ , the value of the functional at the ground truth $\rho_{k+1}$ will be strictly greater than the minimum value achieved by the optimal transport map.
$\min_{\rho} \left( J(\rho) + \frac{1}{2\tau} d_{W_2}^2(\rho, \rho_k) \right) \leq J(\rho_{k+1}) + \frac{1}{2\tau} d_{W_2}^2(\rho_{k+1}, \rho_k)$
The Objective: The method maximizes the "gap" between the left-hand side (optimal value for candidate $J$ ) and the right-hand side (value at ground truth). By maximizing this gap, the candidate $J$ is forced to align with the true $J^*$ .

Loss Function Derivation

The authors derive a min-max objective function (Equation 11):
$\max_{J} \min_{T^k} \sum_{k=0}^{K-1} \left[ J(T^k_\sharp \rho_k) - J(\rho_{k+1}) + \frac{1}{2\tau} \int \|x - T^k(x)\|^2 d\rho_k(x) \right]$

Inner Minimization ( $T^k$ ): Learns the transport map that pushes $\rho_k$ to the "predicted" next state under the current energy $J$ . This approximates the JKO step.
Outer Maximization ( $J$ ): Updates the energy functional to maximize the discrepancy between the predicted state and the actual ground truth $\rho_{k+1}$ .

Practical Implementation

Parametrization:
- Energy Functional ( $J_\theta$ ): Modeled as a free energy functional comprising potential energy ( $V$ ), interaction energy ( $W$ ), and internal energy (scaled negative entropy $-\theta_3 H(\rho)$ ).
- Transport Maps ( $T^k_\phi$ ): Unlike previous JKO-based methods (e.g., JKOnet) that require Input-Convex Neural Networks (ICNNs) to ensure convexity, iJKOnet uses standard architectures (MLPs, ResNets). This is possible because the loss formulation does not impose convexity constraints on the transport map itself, only on the optimization landscape.
Entropy Estimation: The internal energy term requires entropy estimation. The authors use the Kozachenko-Leonenko nearest-neighbor estimator for the base distributions and the change-of-variables formula for the transported distributions.
Training: An end-to-end adversarial training procedure (Gradient Descent-Ascent) is used. No pre-computation of optimal transport couplings is required, enabling scalability.

3. Key Contributions

Inverse Optimization Framework: Casts the problem of recovering energy functionals within the JKO framework as an inverse optimization task, leading to a novel min-max objective.
End-to-End Learning: Introduces a practical scheme that supports standard neural network architectures for transport maps, removing the restrictive and scalability-limiting requirement of ICNNs found in prior work (JKOnet).
Theoretical Guarantees: Provides a theoretical proof (Theorem 3.1) showing that under suitable assumptions (strict convexity and smoothness of potentials), minimizing the proposed loss recovers the ground-truth potential energy gradient up to an additive constant.
Performance: Demonstrates superior performance over existing JKO-based methods (JKOnet, JKOnet*) and competitive results against non-JKO baselines on real-world data.

4. Experimental Results

The authors evaluated iJKOnet on synthetic datasets and real-world single-cell RNA sequencing (scRNA-seq) data.

Synthetic Data (Potential Energy):
- In paired setups (where trajectories are known), iJKOnet matches the performance of JKOnet*.
- In unpaired setups (the realistic scenario where only snapshots exist), iJKOnet significantly outperforms JKOnet*. The paper highlights that JKOnet* degrades rapidly in unpaired settings because it relies on precomputed OT maps which are unstable without trajectory data.
Synthetic Data (Interaction/Internal Energy):
- Learning interaction and internal energy components remains challenging for all methods due to the difficulty of estimating high-dimensional integrals and entropy. However, iJKOnet shows improved stability over JKOnet*.
Real-World Data (Single-Cell Genomics):
- Dataset: Embryoid Body (EB) dataset (5D) and Multiome dataset (50D/100D).
- Metrics: Earth Mover's Distance (EMD), Bures-Wasserstein Unexplained Variance (Bd2-W2-UVP), and Maximum Mean Discrepancy (MMD).
- Findings:
  - In 5D and 100D leave-one-out experiments, iJKOnet (specifically the potential-only variant iJKOnetV) achieved state-of-the-art or comparable results to complex non-JKO methods like DMSB and MMSB.
  - Crucially, iJKOnet is simulation-free regarding trajectory caching, making it significantly faster and more memory-efficient than methods requiring iterative OT pre-computation.

5. Significance and Limitations

Significance:

Scalability: By removing the ICNN constraint and the need for precomputed OT couplings, iJKOnet scales to higher dimensions and larger datasets more effectively than previous JKO-based approaches.
Generality: The framework is flexible enough to handle various energy structures (potential, interaction, internal) without architectural changes.
Bridging Theory and Practice: It successfully bridges inverse optimization theory with computational gradient flows, providing a theoretically grounded method for learning dynamics from sparse, cross-sectional data.

Limitations:

Entropy Estimation: Performance can degrade in very high dimensions due to the difficulty of accurately estimating entropy (internal energy).
Interaction Energy: Learning interaction kernels ( $W$ ) remains difficult, likely due to biases in batched integral estimation.
Theoretical Scope: Current theoretical guarantees are established primarily for potential energy; extending these to interaction and internal energies is future work.
Complex Dynamics: The method does not currently handle birth-death dynamics (population size changes) or time-dependent interaction energies.

In conclusion, iJKOnet represents a significant advancement in learning population dynamics, offering a robust, scalable, and theoretically sound alternative to existing methods by leveraging inverse optimization principles to bypass the computational bottlenecks of traditional JKO implementations.