General Proximal Flow Networks

This paper introduces General Proximal Flow Networks (GPFNs), a unified framework for iterative generative modeling that generalizes Bayesian Flow Networks by replacing fixed KL-divergence updates with arbitrary distance functions, thereby enabling improved generation quality through adaptation to the underlying data geometry.

Alexander Strunk, Roland Assam

Published 2026-03-03
📖 4 min read☕ Coffee break read

Imagine you are trying to teach a robot to draw a perfect picture of a cat.

In the old way of doing this (using Diffusion Models), you might start with a canvas full of static noise and slowly "dial down" the noise, step-by-step, until a cat appears. It's like peeling an onion layer by layer.

In a newer, smarter way called Bayesian Flow Networks (BFNs), the robot doesn't just peel layers; it maintains a "belief" about what the cat looks like. At every step, the robot asks: "Based on what I think the cat looks like right now, and a little bit of new information, how should I update my belief?" It's like a detective updating their theory of the crime scene as new clues arrive.

However, the original "detective" (the BFN) had a very rigid rulebook for how to update that belief. It used a specific mathematical rule called KL Divergence. Think of this rulebook as a straight-line ruler. It's great for simple, flat things, but if you are trying to draw a complex, curved shape (like a real cat or a human face), a straight ruler isn't the best tool. It forces the robot to take awkward, inefficient steps to get from "noise" to "cat."

Enter: General Proximal Flow Networks (GPFNs)

This paper introduces GPFNs, which is like giving the detective a Swiss Army Knife instead of just a ruler.

Here is the simple breakdown of what they did:

1. The "Distance" Problem

In math, when you want to move from "Point A" (noise) to "Point B" (the cat), you need a way to measure the distance between them.

  • The Old Way (BFN): Used the KL Divergence. Imagine this as measuring distance by walking only North, South, East, or West (like a taxi in a grid city). It works, but it's inefficient if the cat is actually located diagonally across the park.
  • The New Way (GPFN): Allows you to choose any way to measure distance. The authors chose the Wasserstein Distance. Imagine this as a helicopter or a straight path through the park. It measures the actual "effort" needed to move the pixels from noise to the cat, respecting the natural curves and shapes of the image.

2. The "Proximal" Step

The word "Proximal" just means "close by." The core idea is: "Don't jump too far away from where you are right now, but move closer to the target."

  • BFN says: "Move closer to the target, but you must follow the strict, straight-line rules of the KL ruler."
  • GPFN says: "Move closer to the target, but use the Wasserstein helicopter to take the most natural, efficient path."

3. The Result: Faster and Better

Because GPFNs use the "helicopter" (Wasserstein) instead of the "grid-walking ruler" (KL), they can get from noise to a perfect image in far fewer steps.

  • The Experiment: The researchers tested this on drawing numbers (0–9).
    • The old method (BFN) needed 100 steps to draw a decent number, and even then, it sometimes looked blurry or weird.
    • The new method (GPFN) could draw a crisp, perfect number in just 5 to 20 steps.
    • It was like comparing a snail crawling across a maze to a bird flying straight to the finish line.

Why Does This Matter?

Think of it like navigation:

  • BFN is like using an old paper map that only shows roads. To get to a destination, you have to follow every twist and turn of the road, even if a shortcut exists through a field.
  • GPFN is like using a GPS with a drone view. It sees the terrain and calculates the most direct, natural path to the destination.

The Big Takeaway:
This paper proves that by changing the "rules of the road" (the math used to update beliefs), we can make AI generators much faster and much higher quality. It doesn't just work for numbers; the authors suggest this "helicopter view" (Wasserstein distance) is perfect for images, videos, and any data where shape and space matter.

In short: GPFNs let AI take the scenic, efficient route instead of the long, winding road.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →