Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

This paper introduces Locality-aware Parallel Decoding (LPD), a novel framework combining flexible parallelized autoregressive modeling and locality-aware generation ordering to significantly accelerate autoregressive image generation by reducing inference steps and latency while maintaining high-quality results.

Zhuoyang Zhang, Luke J. Huang, Chengyue Wu, Shang Yang, Kelly Peng, Yao Lu, Song Han

Published 2026-03-12
📖 4 min read☕ Coffee break read

Imagine you are trying to paint a massive, detailed mural on a wall.

The Old Way: The Slow, One-Brush Artist

Traditional AI image generators (like the ones used before this paper) work like a very strict, slow artist. They have a rule: "You must paint exactly one square inch at a time, in a perfect zig-zag line from top-left to bottom-right."

  • The Problem: If your mural is 256x256 inches, the artist has to make 256 separate trips across the wall to finish it.
  • The Bottleneck: Every time the artist picks up a new square, they have to run back to the paint shelf (the computer's memory) to grab the specific color they need. Even if the artist is super fast at painting, they spend 90% of their time just running back and forth to the shelf. This is called being "memory-bound." It's like a race car stuck in traffic; the engine is great, but the road is too narrow.

The New Way: The "Locality-Aware" Team (LPD)

The researchers in this paper (from MIT, NVIDIA, and First Intelligence) invented a new method called Locality-Aware Parallel Decoding (LPD). Instead of one slow artist, they turned the process into a smart construction crew.

Here is how they did it, using two main tricks:

Trick 1: The "Magic Clipboard" (Flexible Parallelized Modeling)

In the old method, the artist couldn't paint the next square until the previous one was dry.
In the new method, the AI uses "Position Query Tokens." Think of these as magic sticky notes.

  • Instead of waiting for the wall to be painted in order, the AI places sticky notes on the wall where it wants to paint next.
  • It can say, "Okay, Crew! Paint the top-left corner, the bottom-right corner, and the middle spot all at the same time!"
  • The Secret Sauce: Usually, if you paint two spots at once, they might not "see" each other, leading to a messy result (like a blue sky next to a green tree that doesn't match). The researchers created a special "visibility rule" (an attention mechanism) that ensures all the painters in the crew can see each other's work instantly, so the colors blend perfectly even though they are working simultaneously.

Trick 2: The "Smart Neighborhood" Rule (Locality-Aware Ordering)

Just because you can paint many spots at once doesn't mean you should pick random spots.

  • The Mistake: If you tell the crew to paint the top-left corner and the bottom-right corner at the same time, they have no idea what color to use because they are too far apart to see the context.
  • The Solution: The researchers noticed that in images, things are usually local. A leaf is close to a branch; a cloud is close to the sky.
  • The Strategy: The AI uses a "Smart Neighborhood" schedule.
    1. Stay Close: It picks new spots to paint that are right next to what has already been painted. This gives the painters plenty of context (e.g., "I see a tree branch here, so I'll paint a leaf next to it").
    2. Spread Out: It makes sure the spots being painted at the exact same time are far apart from each other. This prevents the painters from stepping on each other's toes or getting confused.

The Result: From a Marathon to a Sprint

By combining these two tricks, the results are incredible:

  • Old Way: To paint a 256x256 image, the AI took 256 steps (one for every patch).
  • New Way (LPD): The AI can do the same image in just 20 steps.
  • Speed: It is 3.4 times faster than previous "fast" methods.
  • Quality: Despite being so much faster, the images look just as good (or better) than the slow ones. The "Smart Neighborhood" rule ensures the image doesn't look glitchy.

Why This Matters

Think of it like upgrading from a single-lane dirt road to a multi-lane highway with smart traffic lights.

  • Before, the AI was stuck in a single lane, waiting for the car in front to move.
  • Now, the AI has 20 lanes open, and the "traffic lights" (the locality rules) tell the cars exactly where to go so they don't crash.

This means we can generate high-quality images, edit photos, or even create videos in a fraction of the time it used to take, making AI tools much more practical for everyday use.