Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

Imagine you have a master chef (the Diffusion Model) who is incredibly talented at cooking any dish you ask for, from a simple sandwich to a complex soufflé. This chef was trained on millions of recipes and knows how to cook everything generally well.

However, sometimes you want something specific:

"Make this sandwich look artistic and beautiful."
"Make sure this sandwich looks exactly like the photo I sent you."
"Make it healthy (low calories) but still tasty."

The problem is that the chef usually has to pick one style. If you train the chef to be a "Beauty Expert," they might forget how to follow your photo instructions. If you train them to be a "Photo-Follower," the food might look ugly.

Traditionally, if you wanted a different mix (e.g., 50% beauty, 50% accuracy), you'd have to send the chef back to culinary school for weeks to learn that specific combination. If you wanted a new mix tomorrow, you'd have to send them back to school again. This is slow, expensive, and impractical.

The Solution: "Diffusion Blend"

This paper introduces a clever new trick called Diffusion Blend. Instead of sending the chef back to school, they create a "Mixing Station" at the moment you order the food (inference time).

Here is how it works, using a few analogies:

1. The "Specialist Chefs" (The Training Phase)

Before you ever order, the researchers train a few "Specialist Chefs" once and for all:

Chef A is an expert at making food look Beautiful (Aesthetics).
Chef B is an expert at making food look Exactly Like the Photo (Text-Image Alignment).
Chef C is an expert at making food Healthy (Human Preference).

These chefs are "fine-tuned" models. They are ready to go.

2. The "Magic Mixer" (The Inference Phase)

Now, you walk up to the counter and say: "I want a sandwich that is 70% Beautiful and 30% Photo-Accurate."

In the old days, the chef would have to stop, re-learn, and start over.
With Diffusion Blend, the system instantly takes the "thought process" (the mathematical recipe) of Chef A and Chef B and blends them together in real-time.

It doesn't just average their final pictures.
It blends their step-by-step cooking instructions as the food is being created.
It's like having two chefs whispering instructions to each other simultaneously, and the system listens to them in the exact ratio you asked for (70% Chef A, 30% Chef B).

The Result: You get a sandwich that is perfectly balanced between beauty and accuracy, created instantly, without the chef ever needing to go back to school.

The Three "Magic Tools"

The paper proposes three specific ways to use this mixer:

DB-MPA (The Multi-Flavor Mixer):
This is the main tool. It lets you mix any number of specialists. Want 40% Beauty, 40% Accuracy, and 20% Health? Just dial it in. The system blends the "whispers" of all three chefs instantly.
DB-KLA (The "Strictness" Dial):
Sometimes, you want the food to be creative, but you don't want the chef to get too wild and forget the original recipe entirely. This tool lets you control how much the chef can "drift" from their original training.
- Low Dial: The chef stays very close to their original, safe style.
- High Dial: The chef is allowed to be very creative and bold.
  You can turn this knob up or down instantly without retraining.
DB-MPA-LS (The "Lightweight" Mixer):
Blending three chefs at once can be computationally heavy (like asking three people to talk at once). This version is a smart shortcut. Instead of listening to all chefs at every single second, it randomly picks one chef to listen to at each step, based on your percentages.
- Analogy: Instead of a choir singing together, it's like a conductor pointing to different singers one by one so fast that your ear hears a perfect blend.
- Benefit: It runs just as fast as the original chef, but still gives you the blended result.

Why is this a Big Deal?

No More Waiting: You don't need to wait days for a new model to be trained for your specific taste.
Infinite Customization: You can tweak your preferences on the fly. "Make it a bit more blue," "Make it less realistic," "Make it more artistic."
Solves Conflicts: It handles situations where goals fight each other (e.g., "Make it look like a photo" vs. "Make it look like a painting") much better than previous methods, finding the perfect middle ground.

In summary: Diffusion Blend turns the rigid, "one-size-fits-all" AI image generator into a flexible, user-controlled tool. It allows you to be the director, mixing and matching different "expert" styles instantly to get exactly the image you imagine, right now.

1. Problem Statement

Current methods for aligning diffusion models with downstream objectives (e.g., aesthetic quality, text-image consistency, human preference) typically rely on Reinforcement Learning (RL) fine-tuning. However, these approaches suffer from significant limitations:

Static Alignment: Standard RL fine-tuning optimizes a model for a fixed reward function and a fixed KL regularization weight (which controls the deviation from the pre-trained base model).
Inflexibility: User preferences vary across contexts, individuals, and prompts. To accommodate different trade-offs (e.g., more aesthetics vs. more prompt adherence) or different regularization strengths, existing methods require training separate models for every configuration or performing expensive grid searches.
Computational Cost: Inference-time alignment methods that do not require retraining (like gradient guidance) often incur high computational costs due to multiple sampling steps or require differentiable rewards, which are not always available.

The Core Question: Can we design a fine-tuning procedure that allows a model to generate images aligned with any user-specified linear combination of basis rewards and any regularization strength at inference time, without additional fine-tuning or excessive computation?

2. Methodology: Diffusion Blend

The authors propose Diffusion Blend, a framework that synthesizes new backward diffusion processes by blending existing fine-tuned models. The approach is grounded in the theoretical observation that the score function (drift term) of a reward-aligned diffusion model can be approximated as a linear combination of the score functions of basis models.

Theoretical Foundation

The paper establishes a relationship between the backward Stochastic Differential Equation (SDE) of a pre-trained model ( $f^{pre}$ ) and a fine-tuned model aligned with reward $r$ and KL weight $\alpha$ ( $f^{(r,\alpha)}$ ).

Proposition 1: The drift of the aligned model is the pre-trained drift minus a control term $u^{(r,\alpha)}$ dependent on the reward.
Approximation: Using a Jensen gap approximation, the complex control term $u^{(r,\alpha)}$ is approximated by the expectation of the reward. Due to the linearity of expectation, the control term for a linear combination of rewards $r(w) = \sum w_i r_i$ can be approximated as the weighted sum of the control terms of the individual basis rewards.

Three Core Algorithms

Based on this theory, the authors instantiate three algorithms:

DB-MPA (Diffusion Blend - Multi-Preference Alignment):
- Goal: Align with a linear combination of multiple rewards $r(w) = \sum w_i r_i$ .
- Mechanism: At inference, it computes the blended drift term by linearly combining the drift terms of $m$ independently fine-tuned models (each trained on a single basis reward $r_i$ with fixed $\alpha$ ).
- Formula: $f^{(r(w),\alpha)} \approx \sum w_i f^{(r_i, \alpha)}$ .
DB-KLA (Diffusion Blend - KL Alignment):
- Goal: Control the strength of KL regularization (deviation from the base model) without retraining.
- Mechanism: It blends the drift of a single reward-aligned model ( $f^{(r,\alpha)}$ ) with the pre-trained model ( $f^{pre}$ ) based on a user-specified factor $\lambda$ .
- Formula: $f^{(r, \alpha/\lambda)} \approx (1-\lambda)f^{pre} + \lambda f^{(r,\alpha)}$ . This allows smooth interpolation between the base model and the fully aligned model.
DB-MPA-LS (Diffusion Blend - Multi-Preference Alignment with LoRA Sampling):
- Goal: Reduce the inference cost of DB-MPA.
- Problem: DB-MPA requires running $m$ models at every denoising step, leading to $m \times$ latency.
- Solution: Instead of summing the drifts, it randomly samples one of the $m$ fine-tuned LoRA adapters at each denoising step with probability proportional to the user weights $w_i$ .
- Theoretical Guarantee: Proposition 2 proves that for Lipschitz continuous functions, the marginal probability distribution of the SDE driven by a weighted sum of drifts is identical to the SDE driven by a randomly sampled drift (Bernoulli/Categorical switching). This reduces inference time to that of a single model while maintaining performance.

3. Key Contributions

Theoretical Derivation: Proved that the backward diffusion process for a multi-reward/regularization objective can be approximated by blending the processes of basis models, avoiding the need for intractable normalization constants or explicit density evaluation.
Algorithmic Innovation: Introduced DB-MPA, DB-KLA, and the efficient DB-MPA-LS. DB-MPA-LS is particularly significant as it eliminates the linear scaling of inference time with the number of preferences, a major bottleneck in previous inference-time alignment methods.
Comprehensive Evaluation: Validated the approach on Stable Diffusion v1.5 and SDXL using multiple reward models (ImageReward, VILA, PickScore, JPEG compressibility) and benchmark datasets (DrawBench, GenEval).

4. Experimental Results

The authors compared Diffusion Blend against several baselines: Rewarded Soup (RS) (parameter interpolation), CoDe (gradient-free look-ahead), RGG (reward gradient guidance), and MORL (Multi-Objective RL, treated as an oracle upper bound).

Performance (Pareto Front): DB-MPA consistently outperformed all baselines (RS, CoDe, RGG) across various preference weights ( $w$ ). Its performance closely matched the MORL oracle, which represents the theoretical optimum achievable by training a separate model for every specific weight configuration.
Efficiency:
- DB-MPA: Achieved superior alignment but incurred $\approx 2\times$ inference time (for 2 rewards) compared to the base model.
- DB-MPA-LS: Achieved nearly identical performance to DB-MPA (Pareto front overlap) but with inference latency matching the base Stable Diffusion model (no multiplicative overhead).
Scalability: The method scaled well to 3 and 4 rewards. While RS performance degraded significantly as the number of rewards increased, DB-MPA and DB-MPA-LS maintained stable improvements.
Conflicting Rewards: In adversarial settings (e.g., optimizing for JPEG compressibility vs. Aesthetic quality), DB-MPA successfully balanced the trade-offs, whereas gradient-based methods failed or required non-differentiable approximations.
KL Control: DB-KLA demonstrated smooth, continuous control over the alignment strength, generating images that matched specific $\lambda$ -tuned RL models without retraining.

5. Significance

User-Centric AI: Enables real-time, user-driven customization of generative models. Users can dynamically adjust the "personality" of the model (e.g., "make it more artistic" or "stick closer to the prompt") without waiting for retraining.
Resource Efficiency: By decoupling the need for multiple fine-tuned models from the inference process, Diffusion Blend makes multi-objective alignment feasible for resource-constrained environments and real-time applications.
Generalization: The framework is model-agnostic regarding the specific reward functions (as long as basis models are available) and works across different diffusion backbones (SD 1.5 and SDXL).
Theoretical Insight: Provides a principled mathematical justification for "blending" diffusion trajectories, moving beyond heuristic parameter averaging (like Rewarded Soup) to a process-level synthesis that preserves the underlying stochastic dynamics.

In summary, Diffusion Blend solves the rigidity of current RL-based alignment by offering a theoretically grounded, computationally efficient method to navigate the multi-objective preference space of diffusion models at inference time.

Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

The Solution: "Diffusion Blend"

1. The "Specialist Chefs" (The Training Phase)

2. The "Magic Mixer" (The Inference Phase)

The Three "Magic Tools"

Why is this a Big Deal?

1. Problem Statement

2. Methodology: Diffusion Blend

Theoretical Foundation

Three Core Algorithms

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Interpretable Tau-PET Synthesis from Multimodal T1-Weighted and FLAIR MRI Using Partial Information Decomposition Guided Disentangled Quantized Half-UNet

SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses

MultiModalPFN: Extending Prior-Data Fitted Networks for Multimodal Tabular Learning

"Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation

OpenGLT: A Comprehensive Benchmark of Graph Neural Networks for Graph-Level Tasks