Free Lunch for Pass@kk? Low Cost Diverse Sampling for Diffusion Language Models

This paper proposes a training-free, low-cost intervention for Diffusion Language Models that sequentially repels intermediate samples in a batch to enhance generative diversity and improve Pass@kk performance on reasoning tasks without significant computational overhead.

Sean Lamont, Christian Walder, Paul Montague, Amir Dezfouli, Michael Norrish

Published 2026-03-06
📖 4 min read☕ Coffee break read

Imagine you are trying to solve a very tricky puzzle, like writing a piece of code that works or solving a complex math problem. You ask an AI for help.

If you ask the AI once, it gives you one answer. But what if that answer is wrong? In the world of AI, we often ask it to try many times at once (say, 16 times) to see if any of those attempts hit the right solution. This is called Pass@k (Pass at k attempts).

The problem is that current AI models are a bit like a stubborn student who, when asked to try 16 times, just writes the same wrong answer 16 times, just with slightly different handwriting. They get stuck in a "loop" of failure. This is called mode collapse.

This paper introduces a clever, free trick called ODD (Orthogonal Diverse Diffusion) to fix this. Here is how it works, explained with simple analogies:

1. The Problem: The "Echo Chamber"

Imagine you are in a room with 16 people (the AI samples) trying to find the exit.

  • Old Way (Standard Sampling): Everyone is listening to the same radio station. They all hear the same wrong direction. Even though there are 16 people, they all run into the same wall. You have 16 people, but only 1 unique idea. It's a waste of energy.

2. The Solution: The "Repulsion Field"

The authors propose a new rule for the game. As the AI generates these 16 attempts, it doesn't treat them as separate, isolated events. Instead, it treats them as a team that needs to spread out.

  • The Analogy: Imagine the first person starts walking down a hallway. The second person is told, "Don't walk where the first person is walking; go to the left." The third person is told, "Don't walk where the first or second person is; find a new path."
  • The Magic: The AI uses a mathematical "repulsion force." As it generates the 2nd, 3rd, and 4th answers, it actively pushes them away from the features of the previous answers. It forces them to explore different corners of the solution space.

3. How It Works (The "Free Lunch")

The coolest part of this paper is that it requires no retraining.

  • The Metaphor: Imagine you are baking a cake (the AI model). Usually, to make the cake taste different, you have to change the recipe or bake a whole new batch from scratch (retraining).
  • ODD's Trick: Instead of changing the recipe, the baker just rearranges the ingredients while the cake is being mixed. They take the batter that looks like the first cake and gently push it in a different direction before it sets.
  • The Result: You get 16 distinct cakes (solutions) from the same batter, with almost no extra time or cost.

4. Why It Matters

  • For Math & Code: In these fields, the "right" answer is often rare. If the AI keeps guessing the same wrong thing, it will never find the right one. By forcing the AI to try 16 different approaches, the chances of hitting the "golden ticket" (the correct solution) skyrocket.
  • The Trade-off: Sometimes, forcing the AI to be different might make one individual answer slightly worse (because it's trying a risky path). But, the group of 16 answers becomes much more likely to contain at least one perfect solution.

Summary

Think of ODD as a "Diversity Coach" for AI.

  • Before: The AI was a choir where everyone sang the same note, even if it was off-key.
  • After: The Coach whispers to each singer, "You, try a high note. You, try a low note. You, try a different rhythm."
  • Outcome: Even if the choir is singing a difficult song, the chance that someone hits the perfect note is much higher, and they do it without needing to hire new singers or buy new instruments.

The paper proves this works on hard math problems (GSM8K) and coding challenges (HumanEval), showing that with this simple, low-cost tweak, AI can solve problems it previously couldn't, simply by being more creative and less repetitive.