DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling

Imagine you are trying to guide a very talented but slightly confused artist (the Diffusion Model) to paint a picture of a specific subject, say, a "Golden Retriever."

Normally, the artist starts with a canvas full of static noise and slowly adds details, turning chaos into a clear image. This is how modern AI image generators work.

The Problem: The "Pushy" Director

Now, imagine you want to trick a security guard (the Classifier) into thinking your picture of a Golden Retriever is actually a "Cat." This is called an Adversarial Attack.

To do this, you act as a director, shouting instructions to the artist at every step of the painting process.

The Old Way (AdvDiff): You scream, "Make it look more like a cat! Push it harder!" You push the artist's brush in the direction that makes the security guard say "Cat."
The Result: At first, it works! The picture starts to look like a cat. But because you are pushing so hard and so blindly, you accidentally push the artist off the canvas. The painting becomes a distorted, unrecognizable mess of colors and shapes. It might fool the guard, but it's no longer a valid picture. It's garbage.

The paper calls this "Catastrophic Collapse." The more you try to force the attack, the worse the image quality gets.

The Solution: The "Tangent" Guide (DPAC)

The authors of this paper realized the problem isn't what you are asking for, but how you are asking for it.

They discovered that when you push the artist, you are usually pushing in two directions at once:

The "Normal" Push (The Bad One): Pushing the artist off the canvas, into the void of nonsense. This ruins the image.
The "Tangential" Push (The Good One): Pushing the artist along the edge of the canvas. This changes the image to look like a cat, but keeps it firmly on the canvas as a valid, high-quality picture.

DPAC (Distribution-Preserving Adversarial Control) is a new rule for the director. Instead of just screaming "Push harder!", the director uses a special filter:

"Only push the artist along the edge of the canvas. If you try to push them off the edge, stop immediately."

How It Works (The Metaphor)

Think of the "canvas" as a mountain range where all the beautiful, realistic images live.

The Score Function: This is like a GPS that always points toward the nearest peak (the most realistic image).
The Attack Gradient: This is the force trying to move the image toward the "Cat" target.
The Old Method: It grabs the mountain climber and yanks them in the direction of the target, even if that means dragging them off a cliff.
The DPAC Method: It looks at the direction of the target, sees the cliff, and says, "No, we can't go that way." Instead, it finds the path that runs parallel to the mountain ridge. It slides the climber along the ridge until they reach the "Cat" zone, but they never fall off the mountain.

Why This Matters

The paper proves mathematically that by removing the "off-the-canvas" push, you get two amazing things:

Better Quality: The images stay sharp and realistic (low FID score).
More Efficiency: You don't need to shout as loud. Because you aren't wasting energy pushing the artist off the cliff, you can achieve the same attack success with much less effort.

The Results

In their experiments, they tried to trick an AI classifier on 100 different types of images.

The Old Way: When they tried to be very aggressive, the images turned into colorful static noise (FID score jumped from ~40 to ~70).
The DPAC Way: Even when they were very aggressive, the images stayed clear and beautiful (FID stayed around ~34-45). They also used 66% less energy to get the same result.

In a Nutshell

DPAC is like a smart navigation system for AI art attacks. It realizes that to change an image's identity without destroying its beauty, you have to steer it along the path of reality, not off it. It turns a destructive, messy process into a precise, surgical one.

1. Problem Statement

Diffusion models are state-of-the-art generative models, often steered using guidance mechanisms to generate specific classes or adversarial examples (Unrestricted Adversarial Examples, UAEs). However, existing gradient-based guidance methods (e.g., AdvDiff) suffer from a fundamental instability:

The Trade-off: As the guidance strength increases to maximize the Attack Success Rate (ASR), the quality of the generated samples catastrophically collapses.
The Symptom: High ASR is achieved at the cost of extremely high Fréchet Inception Distance (FID), resulting in images with severe artifacts and structural distortions that are no longer valid "adversarial examples" (which must be both effective and realistic).
The Root Cause: The paper identifies that standard guidance injects a "normal" component (parallel to the score function $\nabla \log p_t$ ) into the sampling trajectory. While this component effectively pushes the sample toward the target class, it aggressively drives the trajectory off the data manifold, distorting the underlying probability distribution.

2. Theoretical Foundations

The authors formalize the relationship between adversarial control and sample quality using Stochastic Optimal Control (SOC) and Girsanov's Theorem.

Path-KL Divergence as Control Energy:
The paper defines the degradation of sample quality as the Kullback-Leibler (KL) divergence between the path distributions of the controlled process ( $P_u$ ) and the nominal (uncontrolled) process ( $P_0$ ). By Girsanov's theorem, this path-KL divergence is exactly equal to the cumulative control energy:
$KL(P_u \| P_0) = \frac{1}{2} \mathbb{E}_{P_u} \int_0^1 \|u_t(X_t, t)\|_2^2 \, dt$
Connection to Perceptual Fidelity:
The authors prove that minimizing this path-KL (control energy) simultaneously tightens the upper bounds on the 2-Wasserstein distance and the FID. This establishes a principled link: lower control energy implies higher perceptual fidelity.
Tangential vs. Normal Control:
Using a variational perspective and the Fokker-Planck equation, the authors decompose any control vector $u_t$ $u_{t}$ into two components relative to the data manifold:
1. Normal Component ( $u_\parallel$ ): Parallel to the score function. This changes the density $p_t$ (distorting the distribution) and increases control energy without contributing to the "tangential" movement along the manifold.
2. Tangential Component ( $u_\perp$ ): Orthogonal to the score (tangent to iso-density surfaces). This steers the sample along the manifold, preserving the density while achieving the classification gain.
- Optimality Condition: To achieve a fixed first-order classification gain with minimum energy, the control must lie entirely in the tangential subspace. The normal component is mathematically proven to be wasteful and harmful.

3. Methodology: DPAC

Based on the theory, the authors propose DPAC (Distribution-Preserving Adversarial Control), a guidance framework that surgically removes the harmful normal component.

Core Mechanism:
Instead of using the raw classifier gradient ( $w_k$ ) as the control direction, DPAC projects this gradient onto the subspace orthogonal to the generative score ( $s_k$ ).
$u_k^* = w_k - \frac{\langle w_k, s_k \rangle_{G_k}}{\langle s_k, s_k \rangle_{G_k}} s_k$
Where $\langle \cdot, \cdot \rangle_{G_k}$ is a metric-weighted inner product (typically Euclidean or noise-scaled). This projection isolates the "tangential" gradient that steers generation within the data manifold.
Practical Implementation (Denoise-then-Perturb):
Directly injecting a drift term in discrete solvers (like DDIM) is numerically unstable. DPAC implements a robust Project-then-Normalize strategy:
1. Denoise: Take a standard step using the base sampler.
2. Project: Compute the raw gradient, project out the score-parallel component.
3. Normalize: Normalize the resulting vector to a unit length.
4. Perturb: Apply the perturbation scaled by a schedule $\eta_k$ .
  This decouples the direction (determined by the projection) from the magnitude (determined by the schedule), preventing the numerical explosion seen in raw gradient methods.
Discrete Robustness:
Theoretical analysis shows that in discrete solvers, the score-parallel component is responsible for the leading $O(\Delta t)$ error term in the Wasserstein distance. By removing it, DPAC achieves an $O(\Delta t^2)$ quality gap, making the method second-order robust to discretization errors.

4. Key Results

Experiments were conducted on ImageNet-100 using a Latent Diffusion Model (LDM) and a ResNet-50 classifier.

Stability vs. Guidance Scale:
- AdvDiff (Baseline): As the guidance scale ( $\eta$ ) increases, FID degrades catastrophically (e.g., from 39.9 to 69.37 at high scales), indicating distributional collapse.
- DPAC: Remains stable even at high guidance scales, maintaining a much lower FID (44.89 at the same scale where AdvDiff fails).
Peak Fidelity & Efficiency:
- DPAC achieves a superior peak FID of 33.90, compared to AdvDiff's best of 34.66.
- Crucially, DPAC achieves this with only one-third of the control energy (CPE) required by AdvDiff to reach its inferior optimum.
Energy Reduction:
Across all scales, DPAC consistently uses approximately 66% less energy (path-KL proxy) than the baseline, validating the theoretical claim that removing the normal component reduces the necessary control magnitude.
Qualitative Evidence:
Visual comparisons show that AdvDiff at high scales produces severe color/texture corruption and structural distortion, whereas DPAC preserves coherent structures and realistic textures.

5. Significance and Contributions

Theoretical Insight: The paper provides the first rigorous theoretical explanation for the "quality collapse" in adversarial diffusion guidance, linking it to the control energy and path-KL divergence via Girsanov's theorem.
Geometric Principle: It establishes that tangential control (orthogonal to the score) is the optimal strategy for minimizing distributional drift while maximizing task performance.
Practical Algorithm: DPAC offers a simple, plug-and-play modification to existing diffusion guidance pipelines that significantly improves the trade-off between attack success and sample quality.
Robustness: The method is shown to be robust to discretization errors and metric approximations, making it suitable for practical deployment in discrete solvers like DDIM.

In summary, DPAC resolves the long-standing instability in adversarial diffusion sampling by mathematically proving that "pushing" off the data manifold is unnecessary and harmful, replacing it with a "sliding" mechanism along the manifold that preserves distributional integrity.

DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling

The Problem: The "Pushy" Director

The Solution: The "Tangent" Guide (DPAC)

How It Works (The Metaphor)

Why This Matters

The Results

In a Nutshell

1. Problem Statement

2. Theoretical Foundations

3. Methodology: DPAC

4. Key Results

5. Significance and Contributions

More like this

On the security of 2-key triple DES

Security issues in a group key establishment protocol

The impact of quantum computing on real-world security: A 5G case study

Yet another insecure group key distribution scheme using secret sharing

How not to secure wireless sensor networks: A plethora of insecure polynomial-based key pre-distribution schemes