Diffusion & Adversarial Schrödinger Bridges via Iterative Proportional Markovian Fitting

Imagine you are trying to teach a robot to turn a picture of a cat into a picture of a dog. But here's the catch: you don't have any photos where a specific cat is paired with its "dog twin." You only have a big pile of random cat photos and a big pile of random dog photos.

This is the problem of Unpaired Image Translation. The paper you're reading proposes a new, smarter way to solve this using a concept called the Schrödinger Bridge.

Here is the breakdown of their idea, using simple analogies:

1. The Goal: The Perfect Bridge

Think of the Schrödinger Bridge as the most efficient, straightest path to get from "Cat Land" to "Dog Land."

Optimality: If you start with a specific cat, you want the resulting dog to look like that cat's personality (e.g., if the cat is fluffy, the dog should be fluffy). You don't want a random dog.
Marginal Matching: By the end of the process, the entire pile of dogs you created must look exactly like the real pile of dog photos you have.

2. The Old Ways: Two Flawed Methods

Before this paper, researchers tried two main ways to build this bridge, but both had a fatal flaw: They forgot their starting point.

Method A (IPF - The "Map Reader"): This method starts with a perfect map (the rules of physics) and tries to adjust the path to match the destination (the dog photos).
- The Problem: As it keeps adjusting the path to fit the destination, it slowly forgets the original map. It ends up with a pile of dogs, but they might not look like the cats they started from. It's like a GPS that gets you to the right city but takes you through a different neighborhood than you intended.
Method B (IMF - The "Path Walker"): This method starts with a pile of cats and tries to walk them toward the dogs while keeping the "cat-ness" intact.
- The Problem: As it walks, it slowly loses its balance. The final pile of dogs might look like the cats, but they don't look like real dogs anymore. It's like a dancer who keeps their rhythm but forgets the steps, ending up in a weird pose.

Both methods suffer from Error Accumulation. Every time they take a step, they get slightly more confused, and eventually, the whole process falls apart.

3. The New Solution: The "Alternating Dance" (IPMF)

The authors realized that the "flawed" method people were actually using in practice (a heuristic fix) was secretly doing something brilliant. They named this new unified method IPMF (Iterative Proportional Markovian Fitting).

Think of IPMF as a dance between two partners:

Partner 1 (The Map Reader): "Okay, let's make sure our path leads to the right destination (the Dog pile)."
Partner 2 (The Path Walker): "Okay, let's make sure we are still holding hands with our original Cat."

Instead of letting one partner take over and forget the other, IPMF forces them to take turns.

Step 1: Fix the path to match the destination.
Step 2: Fix the connection to the start.
Step 3: Fix the path again.
Step 4: Fix the connection again.

The Magic: By constantly switching back and forth, they cancel out each other's mistakes. If Partner 1 gets a little lost, Partner 2 pulls them back. If Partner 2 gets a little confused, Partner 1 corrects them. This prevents the "forgetting" and "losing balance" that plagued the old methods.

4. The Secret Sauce: The Starting Point

The paper also discovered a superpower: You can choose where the dance begins.

In the past, you were forced to start the dance in a very specific, boring way. But IPMF allows you to start with a "head start."

The Analogy: Imagine you are trying to translate a cat to a dog.
- Old Way: You start with a random guess. "Maybe this cat turns into this dog?" (Bad guess).
- New Way (IPMF): You can use a pre-trained AI (like Stable Diffusion) to make a good guess first. "This fluffy cat probably turns into a Golden Retriever." You feed this good guess into the dance.

Because the dance starts with a better guess, the final result is sharper and more accurate.

If you want the output to look exactly like the input (high similarity), you start the dance one way.
If you want the output to look more creative (high quality), you start the dance another way.

Summary

The paper says: "Stop trying to solve this problem with just one method. Instead, mix the two best methods together, let them take turns correcting each other, and give them a good starting point."

The Result: A system that can turn cats into dogs (or translate any two unpaired datasets) without losing the identity of the original image or the quality of the final image. It's like building a bridge that is both strong (doesn't collapse) and direct (gets you exactly where you need to go).

1. Problem Statement

The paper addresses the Schrödinger Bridge (SB) problem, which seeks to find the most likely stochastic process (in the sense of Kullback-Leibler divergence) that transports a source distribution $p_0$ to a target distribution $p_1$ over a time interval $[0, 1]$ , given a prior Wiener process. This is crucial for unpaired domain translation tasks (e.g., image style transfer, single-cell data analysis) where the goal is to map samples from one domain to another while preserving optimality (input-output similarity) and ensuring marginal matching (target distribution fidelity).

Existing methods face a trade-off:

Iterative Proportional Fitting (IPF): Starts with a prior process satisfying optimality and iteratively enforces marginal matching. It minimizes forward KL divergence but suffers from "prior forgetting" in practice, where approximation errors cause the process to lose its optimality properties.
Iterative Markovian Fitting (IMF): Starts with a process satisfying marginal matching and iteratively enforces optimality. It minimizes reverse KL divergence but can accumulate errors, leading to a loss of marginal matching.

In practice, practitioners use a heuristic bidirectional modification of IMF (alternating between forward and backward diffusion training) to stabilize training. However, the theoretical underpinnings of this heuristic were unclear, and it lacked a unified framework connecting it to IPF.

2. Methodology: Iterative Proportional Markovian Fitting (IPMF)

The authors propose Iterative Proportional Markovian Fitting (IPMF), a unified framework that theoretically explains and formalizes the heuristic bidirectional IMF procedure.

Core Insight

The paper demonstrates that the heuristic bidirectional IMF is mathematically equivalent to an alternating sequence of IPF projections and IMF projections.

IMF Step: Projects onto the space of Markov processes (optimizing for the "optimality" property) while preserving marginals.
IPF Step: Projects onto the reciprocal class (enforcing marginal matching) while preserving the conditional structure.

The IPMF algorithm alternates between these steps in a specific 4-step cycle per iteration:

Reciprocal Projection ( $proj_R$ ): Combines the current joint distribution with the Brownian Bridge (prior).
Backward Markovian Projection ( $proj_M$ ) + IPF Projection ( $proj_1$ ): Fits the backward process and enforces the target marginal $p_1$ .
Reciprocal Projection ( $proj_R$ ): Re-applies the prior structure.
Forward Markovian Projection ( $proj_M$ ) + IPF Projection ( $proj_0$ ): Fits the forward process and enforces the source marginal $p_0$ .

Key Theoretical Contributions

Unified Framework: IPMF subsumes both IPF and IMF as special cases depending on the initialization.
Convergence Guarantees:
- Gaussian Case: The authors prove exponential convergence of IPMF to the true SB solution for Gaussian marginals in both discrete and continuous time settings. They define an "optimality matrix" $A$ and show that IPMF iterations contract the distance between the current $A$ and the optimal $A^* = \epsilon^{-1}I$ .
- General Case: Under the assumption that $p_0$ and $p_1$ have bounded supports, the authors prove weak convergence to the SB solution for both discrete and continuous time.
Error Correction: Unlike one-directional methods that accumulate errors, the bidirectional nature of IPMF allows it to correct marginal mismatches at every step, preventing divergence.

3. Key Contributions

Theoretical Unification: The paper reveals that the practical bidirectional IMF is secretly performing IPF iterations. This leads to the definition of IPMF, providing a rigorous theoretical basis for the heuristic.
Convergence Proofs:
- Proven exponential convergence for Gaussian distributions.
- Proven weak convergence for distributions with bounded support.
- Conjecture of convergence under very general settings.
Flexible Initialization Strategy: A major practical contribution is the ability to trade-off between generation quality and input-output similarity by designing the starting coupling (initial process).
- IMF-like initialization: Prioritizes optimality (similarity).
- IPF-like initialization: Prioritizes marginal matching (distribution fidelity).
- Custom couplings (e.g., SDEdit): Can be used to inject prior knowledge or improve specific metrics.
Empirical Validation: Extensive experiments across Gaussian setups, 2D toy problems, SB benchmarks, and real-world image datasets (Colored MNIST, CelebA, AFHQ).

4. Experimental Results

The authors evaluated IPMF using Diffusion Schrödinger Bridge Matching (DSBM) and Adversarial Schrödinger Bridge Matching (ASBM) solvers with various starting processes.

Convergence: In Gaussian and 2D toy settings, IPMF converged to the same solution regardless of the starting coupling (IMF, IPF, Identity, or arbitrary couplings), validating the theoretical convergence claims.
SB Benchmark: On the Entropic Optimal Transport benchmark, IPMF variants (DSBM-IMF, DSBM-IPF, etc.) achieved competitive or superior performance compared to state-of-the-art baselines (SF2M-Sink), particularly in higher dimensions.
Image Translation (CelebA & MNIST):
- Trade-off Control: Experiments showed that different initializations led to different points on the Pareto frontier of FID (generation quality) vs. MSE (input-output similarity).
- SDEdit Initialization: Using Stable Diffusion or DDPM-based SDEdit as a starting coupling allowed the model to achieve high similarity (low MSE) while maintaining reasonable generation quality, outperforming standard IPF/IMF initializations in specific metrics.
- Robustness: The method successfully handled unpaired translation tasks, preserving semantic features (e.g., hair color, background) while changing the domain style.

5. Significance and Impact

Unification of SB Methods: IPMF provides a single, coherent framework that explains and unifies previously distinct approaches (IPF, IMF, DSBM, ASBM), clarifying their relationships and convergence properties.
Solving the "Prior Forgetting" and Error Accumulation: By formally integrating IPF projections into the IMF loop, IPMF mitigates the error accumulation issues that plague one-directional distillation methods (like Rectified Flows) and the prior forgetting of standard IPF.
Practical Flexibility: The ability to tune the starting coupling offers a new mechanism for practitioners to tailor SB solvers to specific application needs (e.g., prioritizing strict image similarity for medical imaging vs. high-fidelity generation for art).
Foundation for Future Work: The theoretical guarantees for bounded support distributions and the convergence analysis for Gaussians lay the groundwork for applying SB methods to more complex, real-world data distributions and multi-marginal problems.

In summary, this paper transforms a practical heuristic (bidirectional IMF) into a theoretically grounded, convergent, and flexible algorithm (IPMF), significantly advancing the state of the art in Schrödinger Bridge-based generative modeling and domain translation.

Diffusion & Adversarial Schrödinger Bridges via Iterative Proportional Markovian Fitting

1. The Goal: The Perfect Bridge

2. The Old Ways: Two Flawed Methods

3. The New Solution: The "Alternating Dance" (IPMF)

4. The Secret Sauce: The Starting Point

Summary

1. Problem Statement

2. Methodology: Iterative Proportional Markovian Fitting (IPMF)

Core Insight

Key Theoretical Contributions

3. Key Contributions

4. Experimental Results

5. Significance and Impact

More like this

A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset

Probabilistic Language Tries: A Unified Framework for Compression, Decision Policies, and Execution Reuse

FLeX: Fourier-based Low-rank EXpansion for multilingual transfer

Spectral Edge Dynamics Reveal Functional Modes of Learning

S3S^3S3: Stratified Scaling Search for Test-Time in Diffusion Language Models

$S^3$ : Stratified Scaling Search for Test-Time in Diffusion Language Models