Path Planning for Masked Diffusion Model Sampling

Imagine you are trying to write a story, but you start with a page that is completely blank, covered in black ink. Your goal is to slowly erase the black ink and reveal the words underneath, one by one, until you have a perfect story.

This is how Masked Diffusion Language Models (MDMs) work. They are a type of AI that generates text (or proteins, or code) by starting with a "noisy" mess and gradually cleaning it up.

However, the old way of doing this had a major flaw. It was like a painter who, once they painted a stroke of blue, decided, "Okay, that's blue forever. I can never change it." If they painted a blue sky where a green forest should be, they were stuck with the mistake. They had to keep painting over it, but they couldn't go back and fix the original error. This led to messy, confusing stories.

Enter "Path Planning" (P2).

The authors of this paper realized that to make a masterpiece, you need the freedom to change your mind. They introduced a new strategy called Path Planning, which acts like a smart editor or a project manager for the AI.

Here is how P2 works, using a simple analogy:

The Analogy: The "Edit-While-You-Write" Editor

Imagine you are writing a novel, but you are using a magical pen that only writes on a page covered in black ink.

The Old Way (No Planning):
You pick a random spot on the page, erase a little bit of black ink, and guess what word goes there. Once you write "The," you lock it in. You move to the next random spot, write "cat," and lock it in. If you realize later that "The cat" doesn't make sense because the sentence should have been "The dog," you can't go back. You are stuck with "The cat" and have to force the rest of the sentence to make sense around a mistake.
The New Way (Path Planning / P2):
You still start with the black ink. But now, you have a Smart Editor (the "Planner") sitting next to you.
- Step 1: The Guess. The AI makes a guess at what the whole sentence should look like.
- Step 2: The Plan. The Smart Editor looks at the current messy page and the AI's guess. It asks: "Hey, that 'cat' looks wrong. Let's erase it and try again. Also, that 'The' looks perfect, let's keep it."
- Step 3: The Fix. The AI erases the "cat" (even though it was already written!) and tries to write "dog" instead.

The Magic of P2 is that it allows the AI to "unmask" (erase) and "remask" (re-write) words it has already decided on. It treats the generation process not as a straight line, but as a path that can be adjusted, refined, and corrected along the way.

Why is this a big deal?

The paper shows that this simple idea of "letting the AI change its mind" leads to massive improvements in three very different areas:

🧬 Biology (Proteins & RNA): Think of proteins as complex 3D origami. If you fold the paper wrong, the shape is useless. P2 helps the AI "unfold" and "refold" the protein sequence until it finds a shape that actually works in the real world. The result? AI-designed proteins that are much more likely to be stable and useful for medicine.
📝 Storytelling & Math: When writing a story or solving a math problem, context is key. If you get the first step of a math problem wrong, the whole answer is wrong. P2 allows the AI to look back, realize, "Wait, that first step was wrong," and fix it before finishing the problem. This makes the AI much better at reasoning.
💻 Coding: Writing code is like building a house. If you build the foundation wrong, the house falls. P2 lets the AI check its foundation and fix it before building the roof. The paper shows that a smaller AI using P2 can write better code than a much larger AI that doesn't use it.

The "Planner" Options

The paper suggests three ways to build this "Smart Editor":

Self-Planning: The AI acts as its own editor, using its own confidence to decide what to change.
BERT-Planning: Using a pre-trained, smaller AI (like a seasoned editor) to guide the main AI.
Trained-Planning: Training a specific "editor" AI to learn exactly how to fix mistakes.

The Bottom Line

Before this paper, AI models for text and biology were like painters who couldn't use an eraser. Once they made a mark, it was permanent.

Path Planning (P2) gives the AI an eraser and a map. It allows the model to explore different paths, correct its own mistakes, and find the best possible solution. The result is AI that generates higher-quality, more accurate, and more creative content across science, math, and language.

In short: It's not just about generating the answer; it's about having the wisdom to know when you got it wrong and the ability to fix it.

1. Problem Statement

Masked Diffusion Language Models (MDMs) offer a compelling alternative to autoregressive (AR) models for discrete data generation, particularly in domains lacking natural causal ordering (e.g., biological sequences, code infilling). However, current MDMs suffer from significant limitations during inference:

Fixed Unmasking Order: Standard MDMs typically use a simplified masked inference process where tokens are unmasked uniformly at random or via a fixed schedule. Once a token is unmasked, it remains fixed for the remainder of the generation process.
Error Propagation: Because the unmasking order is often suboptimal and unmasked tokens cannot be revisited, early mistakes in the denoising process propagate, leading to suboptimal generative quality.
Lack of Refinement: Unlike continuous diffusion models, standard discrete MDMs lack a mechanism to iteratively refine or correct "unmasked" tokens that were predicted incorrectly.
Theoretical Gap: While the Evidence Lower Bound (ELBO) for MDMs exists, it assumes a specific unmasking strategy. There is no unified framework to optimize the order of unmasking or to allow for the resampling of previously generated tokens to maximize the likelihood.

2. Methodology: Path Planning (P2)

The authors propose Path Planning (P2), a novel inference sampling strategy that decomposes each generation step into two sub-stages: Planning and Denoising. P2 introduces a "planner" component that dynamically selects which tokens to update (unmask) and which existing tokens to resample (remask).

Core Mechanism

Expanded ELBO: The authors derive a new, expanded ELBO for MDMs. This formulation includes two additional terms involving a Planner ( $G_\phi$ ). The planner's role is to select the optimal set of tokens to unmask (for masked positions) and the optimal set of tokens to resample (for unmasked positions) at each step.
The Planner ( $G_\phi$ ): The planner is a function $G_\phi: V^L \times V^L \to [0, 1]^L$ $G_{ϕ} : V^{L} \times V^{L} \to [0, 1]^{L}$ that takes the current partially noised sequence ( $x_t$ $x_{t}$ ) and the denoiser's predicted clean sequence ( $z$ $z$ ) as input. It outputs probabilities for:
- Masked Planner ( $G_M$ ): Probability that a masked token should be unmasked.
- Unmasked Planner ( $G_U$ ): Probability that an unmasked token should be kept (or conversely, remasked for refinement).
Remasking Capability: Crucially, P2 allows for remasking. If the planner decides an unmasked token is likely incorrect, it can be re-masked and resampled in a subsequent step, effectively allowing the model to "backtrack" and correct errors.

Three Instantiations of P2

The paper proposes three practical ways to implement the planner:

Self-Planning: The denoiser ( $D_\theta$ ) acts as its own planner. It uses its own predicted probabilities for unmasked tokens as confidence scores to decide whether to keep or resample them. This recovers existing methods like MaskGIT and Greedy Ancestral as special cases but adds stochasticity control.
BERT-Planning: A pre-trained BERT model (or similar architecture) is used as the planner. Since BERT is trained to predict masked tokens and assess the "naturalness" of unmasked sequences, it serves as an effective, lightweight, zero-shot planner.
Trained-Planning: A lightweight planner network is trained specifically to predict the optimal unmasking/remasking trajectory. It is trained using the planner-specific terms of the expanded ELBO, supervised to match the ground-truth decoding path.

Algorithm

At each step $t$ :

Denoise: The denoiser predicts a clean sequence $z$ from the current state $x_t$ .
Plan: The planner $G_\phi$ $G_{ϕ}$ evaluates $z$ $z$ and $x_t$ $x_{t}$ to determine a set of positions to update.
- Masked positions are unmasked to $z$ .
- Unmasked positions are remasked with probability determined by $G_U$ , then resampled from the denoiser.
Update: The sequence is updated based on these decisions.

3. Key Contributions

Theoretical Expansion: The paper proves that P2 establishes a new, expanded ELBO on the log marginal likelihood. This theoretical framework justifies the use of non-uniform planners and remasking strategies to improve generative quality, especially when the denoiser is imperfect.
Generalization: P2 is shown to generalize all existing MDM sampling strategies (e.g., Ancestral, Greedy, RDM, DFM, Top-K Marginal) as special cases within a unified framework.
Efficient Planning: The authors demonstrate that a dedicated, large-scale planner is not necessary. Lightweight models (e.g., 8M parameter BERT) or even the denoiser itself can serve as effective planners, making the approach scalable.
Error Correction: By enabling the resampling of unmasked tokens, P2 mitigates the error propagation inherent in standard MDMs.

4. Experimental Results

The authors evaluated P2 across three distinct domains: Protein Sequence Generation, Natural Language Generation, and RNA Sequence Generation.

Protein Sequence Generation

Setup: Used a 150M parameter MDM (based on DPLM architecture).
Results: P2 significantly improved foldability (the percentage of sequences with high structural quality).
- Foldability: Increased from 48.14% (DPLM baseline) to 58.86% (DPLM + P2-Train).
- Metrics: Improved pLDDT (80.23 $\to$ 83.45) and pTM, while maintaining high sequence diversity.
- Comparison: Outperformed larger autoregressive models (e.g., 2.7B ProGen2) and other diffusion baselines (EvoDiff, ESM3) despite having fewer parameters.

Natural Language Generation

Setup: Evaluated on MDM (1.1B) and DiffuLLaMA (7B) across TriviaQA, LAMBADA, GSM8K (math), ROCStories (story), and HumanEval (code).
Results:
- Math Reasoning (GSM8K): P2 lifted MDM performance from 58.5% to 60.9%, surpassing the 7B LLaMA2 baseline (58.6%).
- Code Generation (HumanEval): DiffuLLaMA + P2 achieved 17.6% pass@1, significantly outperforming ancestral sampling (13.2%) and LLaMA2 (1.7%).
- Story Generation: ROUGE scores improved by over 5 absolute points compared to baselines.

RNA Sequence Generation

Setup: 150M MDM trained on RNACentral.
Results: P2 with BERT-Planning improved structural plausibility.
- pLDDT: Increased from 68.12 to 73.28.
- MFE (Minimum Free Energy): Lowered (improved) from -48.46 to -51.91.
- GC Content: Increased to 65.47%, closer to natural RNA sequences.
- Diversity: Maintained high entropy, avoiding mode collapse.

Efficiency and Scaling

Inference Time: P2 offers a tunable trade-off between quality and speed. Increasing sampling steps improves foldability.
Overhead: Using an 8M BERT planner adds only ~24% overhead compared to the base denoiser, while providing substantial quality gains.

5. Significance

Bridging the Gap: P2 narrows the performance gap between discrete diffusion models and autoregressive models, showing that MDMs can outperform much larger AR models (e.g., 1B MDM + P2 vs. 7B LLaMA) in reasoning and code tasks.
Biological Discovery: The ability to generate high-quality, foldable protein and RNA sequences with high structural plausibility suggests P2 could accelerate de novo protein and drug design.
Inference Strategy as a Lever: The paper highlights that for discrete diffusion models, the inference strategy is as critical as the training objective. By optimizing the "path" of generation, one can unlock the full potential of the model without retraining the core denoiser.
Unified Framework: P2 provides a principled, theoretically grounded framework that subsumes previous heuristic methods, offering a clear path for future research in discrete generative modeling.

In summary, Path Planning (P2) transforms MDMs from static, one-pass generators into dynamic, self-correcting systems, achieving state-of-the-art results across language, code, and biological sequence generation.