Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks

This paper identifies a "corruption stage" in few-shot fine-tuned diffusion models caused by a narrowed learning distribution and proposes a Bayesian Neural Network approach with variational inference to broaden this distribution, thereby mitigating corruption and improving image fidelity, quality, and diversity without additional inference costs.

Xiaoyu Wu, Jiaru Zhang, Yang Hua, Bohan Lyu, Hao Wang, Tao Song, Haibing Guan

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Here is an explanation of the paper using simple language and creative analogies.

The Big Picture: Teaching an Artist with a Single Photo

Imagine you have a world-famous painter (the Diffusion Model) who has spent years studying millions of paintings. They can paint anything: cats, cars, sunsets, you name it.

Now, you want this painter to learn to paint your specific pet cat, but you only have one photo of it to show them. This is called "Few-Shot Fine-Tuning."

The goal is to teach the painter just enough to recognize your cat without making them forget how to paint anything else.

The Problem: The "Confused Phase" (The Corruption Stage)

The researchers discovered something weird happens when you try to teach the painter with so few examples. The learning process doesn't go in a straight line; it goes in a weird loop:

  1. Phase 1 (The "Aha!" Moment): At first, the painter gets better. They start to look a bit like your cat. Great!
  2. Phase 2 (The "Corruption Stage"): Suddenly, things go wrong. The painter gets too focused on that single photo. Instead of learning the essence of the cat, they start memorizing the pixels.
    • The Analogy: Imagine the painter is so obsessed with the one photo that they start painting static noise (like TV snow) or weird, glitchy patterns over your cat. The image looks messy and broken. The researchers call this the "Corruption Stage."
  3. Phase 3 (The "Robot" Phase): If you keep training, the painter stops making noise, but now they can only paint that exact one photo. If you ask for "your cat sleeping," they can't do it. They can only copy the photo. They have lost their creativity and become a photocopier.

Why does this happen?
The researchers realized the painter's "brain" (the learned distribution) became too narrow. They were trying to fit a whole universe of possibilities into a tiny box (just one photo). Because the box was so small, the painter panicked and started hallucinating noise before finally giving up and just copying the photo.

The Solution: The "Imagination Booster" (Bayesian Neural Networks)

To fix this, the authors introduced a technique called Bayesian Neural Networks (BNNs).

The Analogy:
Instead of teaching the painter to memorize the photo perfectly, BNNs teach the painter to embrace uncertainty.

  • Without BNNs: The painter thinks, "I must paint exactly this pixel at this spot." This leads to the narrow, glitchy corruption.
  • With BNNs: The painter thinks, "I'm not 100% sure exactly where this pixel goes, but I'm pretty sure it's somewhere in this area."

By treating the painting rules as probabilities (guesses with a range of possibilities) rather than fixed facts, the painter is forced to keep their "brain" wide open. They can't just memorize the one photo because they are constantly allowed to be slightly "wrong" or "random."

The Result:

  • The "Corruption Stage" (the weird noise) disappears because the painter isn't panicking about being too precise.
  • The painter learns the concept of your cat, not just the pixels.
  • You can ask for "your cat in space" or "your cat as a superhero," and they can actually do it, because they learned the idea of the cat, not just the picture.

Why This Matters

  1. No Extra Cost: The best part is that this "Imagination Booster" only works during the learning phase. When you actually ask the painter to create an image later, they work just as fast as before. It's like a training wheel that disappears once you start riding.
  2. Works Everywhere: They tested this on different types of AI models and different tasks (painting objects vs. painting people), and it worked every time.
  3. Better Quality: The images are clearer, look more like the real subject, and follow your text instructions better.

Summary in One Sentence

The paper found that teaching AI with very few photos makes it glitch out and get stuck, but by teaching the AI to be a little bit "uncertain" and flexible (using Bayesian methods), we stop the glitches and get much better, more creative results.