CREPE: Controlling Diffusion with Replica Exchange

Imagine you have a very talented artist (a Diffusion Model) who can paint beautiful pictures. You give them a prompt like "a yellow taxi," and they start with a canvas full of static noise and slowly refine it into a clear image.

Usually, you just tell them what you want, and they do their best. But sometimes, the artist gets a little confused, or you want to tweak the result after they've started painting without hiring a new artist or retraining them. This is called Inference-Time Control.

The paper introduces a new method called CREPE to help control this process. To understand why CREPE is special, let's look at how the old way worked and why it was clunky.

The Old Way: The "Group Hike" (SMC)

Imagine you want to find the best view of a mountain. The old method, called Sequential Monte Carlo (SMC), is like sending a huge group of 1,000 hikers up the mountain all at once.

How it works: They all start at the bottom (noise) and hike up together. Every few steps, the group leader looks around and says, "Okay, hikers 1 through 500, you're going the wrong way; go back and copy hikers 501 through 1000 who are on the right path."
The Problem: This is called "resampling." Eventually, almost everyone in the group ends up copying the same few hikers. The group loses its diversity. If you want 1,000 different views, you might end up with 1,000 copies of the exact same view. Also, if you realize halfway up that you wanted to see a specific flower patch, you can't just tell the group to change course; you have to send the whole group back down and start over.

The New Way: The "Parallel Tempering" (CREPE)

The authors propose CREPE (Controlling with REPlica Exchange). Think of this not as a group hike, but as a team of explorers on a ladder.

Imagine a ladder with 50 rungs.

The Setup: Instead of sending 1,000 people up one ladder, you send one person to stand on each of the 50 rungs.
- Person A is at the bottom (very noisy, very blurry).
- Person B is a bit higher (less noisy).
- ...
- Person Z is at the top (almost a clear image).
The Magic Move (Replica Exchange): Every few minutes, the people on adjacent rungs (say, Person A and Person B) have a conversation. They ask, "Hey, if I swapped places with you, would I have a better view?"
- If the swap makes sense (mathematically speaking), they swap places.
- The person who was at the bottom moves up, and the person who was higher moves down.
The Result:
1. Diversity: Because everyone is constantly swapping and moving up and down the ladder, you get a huge variety of different paths. You don't end up with 1,000 clones; you get 50 unique explorers who have all seen different parts of the mountain.
2. Flexibility: If you suddenly decide, "Actually, I want to see the flower patch," you can just whisper a new instruction to the people on the ladder. They can adjust their path right now without restarting the whole hike.
3. Efficiency: You don't need a massive army of 1,000 people. You just need a few people on a ladder, and they do the work sequentially (one after another) but in parallel (all at once on different rungs).

What Can CREPE Do?

The paper shows CREPE working on several cool tasks:

Temperature Control (Tempering): Imagine you have a photo of a hot summer day, but you want to "cool it down" to look like a winter scene. CREPE helps the model smoothly transition between these states without getting stuck.
Reward Tilting: Imagine you tell the artist, "Make it a yellow taxi, but make it look really cool and shiny." CREPE guides the painting process to prioritize that "coolness" reward, ensuring the final image matches your specific desire.
Mixing Models (Composition): Imagine you have one artist who is great at drawing cars, and another who is great at drawing backgrounds. CREPE can stitch their work together to create a car in a background, even if they were never trained to work together.
Fixing Bias: Sometimes, standard AI guidance makes images look too similar or "stale." CREPE acts like a diversity coach, ensuring the final batch of images is varied and interesting.

The Catch (The "Burn-in")

Just like a new employee needs time to get used to the job, CREPE needs a "burn-in" period. The first few images it generates might be a bit messy as the "explorers" on the ladder find their footing. But once they settle in, the quality and diversity are top-notch.

The Bottom Line

CREPE is a smarter, more flexible way to steer AI image generators. Instead of forcing a massive group to march in lockstep (which leads to boring, repetitive results), it uses a clever "ladder-swapping" technique to keep the generation process diverse, adaptable, and high-quality. It's like upgrading from a rigid marching band to a jazz band that can improvise and change the song on the fly.

1. Problem Statement

Diffusion models have revolutionized generative modeling, but controlling their outputs at inference time to satisfy specific constraints (e.g., reward maximization, posterior sampling, or model composition) without retraining remains a challenge.

Current Limitations:
- Heuristic Guidance: Methods like Classifier-Free Guidance (CFG) rely on heuristic approximations that often introduce bias and inaccuracies.
- Sequential Monte Carlo (SMC): While SMC-based debiasing methods exist, they suffer from three main issues:
  1. Memory Intensity: They require maintaining a large batch of particles in parallel throughout the denoising trajectory.
  2. Sample Diversity: They often suffer from mode collapse and poor diversity, especially when the batch size is small.
  3. Inflexibility: Once the sampling process is complete, SMC cannot refine existing samples. If constraints change or results are unsatisfactory, the entire process must be restarted.

2. Methodology: CREPE

The authors propose CREPE (Controlling with REPlica Exchange), a framework that adapts Replica Exchange (also known as Parallel Tempering, PT) for diffusion model inference-time control.

Core Concept

CREPE inverts the computational paradigm of SMC:

SMC: Propagates a batch of particles sequentially through time (denoising steps) in parallel.
CREPE: Propagates a chain of particles sequentially through time, but runs multiple chains (replicas) at different diffusion time steps in parallel.

Key Components

The algorithm operates on an annealing path $(\pi_t)_{t \in [0,1]}$ interpolating between a target distribution $\pi_0$ (the desired constrained distribution) and a tractable reference distribution $\pi_1$ (e.g., Gaussian noise).

Annealing Path:
The target distribution is defined as a modification of the pretrained diffusion model's marginal $p_t$ . Examples include:
- Tempering: $\pi_t(x) \propto p_t(x)^\beta$
- Reward Tilting: $\pi_t(x) \propto p_t(x) \exp(r_t(x))$
- Model Composition: $\pi_t(x) \propto \prod_j p_t^{(j)}(x)$
Communication Step (Accelerated PT):
Instead of swapping samples directly between adjacent time steps (which is inefficient if distributions have low overlap), CREPE uses Accelerated Parallel Tempering (APT).
- It simulates a forward proposal path from time $t$ to $t'$ and a backward proposal path from $t'$ to $t$ .
- It computes a Radon-Nikodym Estimator (RNE) to calculate the acceptance probability for swapping the states of two replicas at different time steps.
- The acceptance probability $\alpha$ ensures the Markov chain converges to the correct joint distribution without requiring explicit knowledge of the target density, relying instead on the pretrained diffusion model's score functions.
Local Exploration:
After communication steps, replicas undergo local updates (e.g., Unadjusted Langevin Algorithm for Gaussian diffusion or Metropolis-Hastings for discrete diffusion) to refine samples within their specific time-step distribution.
Online Refinement:
Because CREPE generates samples sequentially via MCMC, it supports online refinement. New constraints can be introduced mid-sampling, and the existing chain adapts to the new target without restarting.

3. Key Contributions

Novel Framework: The first formulation of inference-time control for diffusion models using Parallel Tempering, demonstrating that PT can be applied directly to pretrained models without explicit target densities.
Theoretical Derivation: Derivation of swap rates (acceptance probabilities) for various control tasks (tempering, reward-tilting, CFG debiasing, model composition) for both Gaussian and discrete (CTMC) diffusion models.
Superior Diversity & Efficiency: CREPE maintains high sample diversity naturally and supports online refinement, addressing the mode collapse and rigidity of SMC.
Broad Applicability: Validated across continuous (images, molecules) and discrete (text) modalities.

4. Experimental Results

The authors evaluated CREPE against SMC-based baselines (specifically FKC and RNE methods) across several domains:

Molecular Sampling (Boltzmann Distributions):
- Tested on Alanine Dipeptide, Tetrapeptide, and Hexapeptide.
- Result: CREPE achieved lower bias and significantly better sample diversity (measured by TICA projections and MMD) compared to SMC. It successfully avoided mode collapse even with smaller computational budgets.
Image Generation (CFG Debiasing):
- Applied to debias Classifier-Free Guidance on ImageNet-64 and 512.
- Result: While SMC (FKC) performed slightly better with very small sample counts (due to CREPE's burn-in period), CREPE outperformed SMC as the number of samples increased, particularly in FID scores and visual diversity. SMC batches tended to produce visually similar images, whereas CREPE maintained variety.
Reward-Tilting (ImageNet):
- Combined CFG debiasing with prompt-based reward tilting (e.g., "a yellow cab with dark background").
- Result: CREPE successfully aligned generated images with complex prompts after a burn-in period, producing diverse and high-quality samples.
Model Composition (Maze Navigation):
- Stitched short trajectories to form long-horizon paths in a maze.
- Result: CREPE achieved success rates comparable to or better than training a conditional model from scratch, with the added benefit of online refinement. When an intermediate point constraint was added mid-sampling, the trajectories adapted within 1,000 iterations.
Discrete Diffusion (Text & MNIST):
- Applied to sentiment-controlled text generation and MNIST debiasing.
- Result: CREPE significantly reduced perplexity (up to 5x improvement) while maintaining high sentiment accuracy, demonstrating effective debiasing of CFG distortions in discrete spaces.

5. Significance and Future Work

Paradigm Shift: CREPE offers a computationally dual perspective to the dominant SMC approach, proving that sequential generation with parallel time-step exploration is a viable and often superior alternative for diffusion control.
Flexibility: The ability to perform online refinement is a major practical advantage, allowing dynamic constraint adjustment without retraining or regenerating data from scratch.
Limitations:
- Burn-in Period: CREPE requires an initial burn-in phase where samples may not yet match the target distribution, which can be computationally costly for large systems.
- Approximation Errors: The method relies on the assumption of a perfect diffusion model and specific discretization choices; errors can accumulate over iterations.
Future Directions: The authors suggest adapting advanced schedule-tuning and path-selection techniques from classical parallel tempering to further optimize diffusion control.

In summary, CREPE provides a robust, flexible, and diversity-preserving framework for steering diffusion models at inference time, overcoming key limitations of existing Sequential Monte Carlo methods.

CREPE: Controlling Diffusion with Replica Exchange

The Old Way: The "Group Hike" (SMC)

The New Way: The "Parallel Tempering" (CREPE)

What Can CREPE Do?

The Catch (The "Burn-in")

The Bottom Line

1. Problem Statement

2. Methodology: CREPE

Core Concept

Key Components

3. Key Contributions

4. Experimental Results

5. Significance and Future Work

More like this

Robust Multi-agent Communication via Multi-view Message Certification

DySCo: Dynamic Semantic Compression for Effective Long-term Time Series Forecasting

Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method

Forecasting Supply Chain Disruptions with Foresight Learning

UQ-SHRED: uncertainty quantification of shallow recurrent decoder networks for sparse sensing via engression