ProGIC: Progressive and Lightweight Generative Image Compression with Residual Vector Quantization

The paper proposes ProGIC, a progressive and lightweight generative image compression codec based on residual vector quantization and a compact backbone, which achieves significant bitrate savings, faster encoding/decoding speeds, and flexible progressive transmission compared to existing methods.

Hao Cao, Chengbin Liang, Wenqi Guo, Zhijin Qin, Jungong Han

Published 2026-03-04
📖 5 min read🧠 Deep dive

Imagine you are trying to send a high-definition photo of a sunset to a friend, but you are stuck in a place with terrible internet—like a satellite phone in the middle of a forest or a remote mountain. You need to send the image, but the connection is so slow that sending the whole file would take forever.

The Problem with Current Methods:
Most modern image compression tools are like heavy, expensive delivery trucks. They are great at packing a lot of stuff efficiently, but they are too big and slow to drive on narrow, bumpy roads (low-bandwidth networks).

  • Traditional methods (like JPEG) try to shrink the file by throwing away details, resulting in blurry, blocky images.
  • New "Generative" methods (AI that "imagines" the missing details) produce beautiful, sharp images, but they require massive supercomputers to run. They are like trying to drive a Ferrari on a dirt path; it's too heavy and complex for the job.

The Solution: ProGIC
The authors of this paper propose ProGIC (Progressive Generative Image Compression). Think of ProGIC not as a delivery truck, but as a smart, modular LEGO set that can be built piece by piece.

Here is how it works, using three simple analogies:

1. The "Sketch-to-Painting" Analogy (Progressive Decoding)

Imagine an artist drawing a portrait.

  • Old Way: You have to wait until the artist finishes the entire painting before you can see anything. If the internet cuts out halfway through, you get nothing.
  • ProGIC Way: The artist starts with a rough sketch (the base layer). You can see the face immediately! Then, they add shading (the second layer). Now you can see the lighting. Finally, they add fine details like eyelashes and skin texture (the final layers).
  • Why it matters: With ProGIC, as soon as the first few bytes of data arrive, your phone shows a usable, low-quality preview. As more data trickles in, the image gets sharper and clearer. You don't have to wait for the whole file to see what you're looking at.

2. The "Residual Vector Quantization" (RVQ) Analogy

How does the AI know what to draw at each step without sending the whole picture? It uses a technique called Residual Vector Quantization (RVQ).

  • Think of it like a dictionary of shapes.
  • Step 1: The AI looks at the image and says, "Okay, the general shape is a circle." It sends the code for "Circle."
  • Step 2: The AI looks at what's missing (the "residual"). It says, "The circle is a bit off-center and has a bump." It sends the code for "Bump."
  • Step 3: It looks at the tiny details. "There's a speck of dust." It sends the code for "Dust."
  • Instead of sending the whole image, it sends a sequence of small codes that add up to the final picture. This allows the receiver to stop at any point and still have a recognizable image.

3. The "Lightweight Backpack" Analogy (Efficiency)

Most AI image tools are like heavy hiking backpacks filled with bricks (massive computer models). They need powerful computers (GPUs) to carry them.

  • ProGIC's Innovation: The authors built a lightweight backpack using "depthwise-separable convolutions." Imagine replacing those heavy bricks with feather-light foam.
  • The Result: This backpack is so light that it can be carried by a hiker with no gear (a standard mobile phone or a laptop CPU) without breaking a sweat. It runs 10 times faster than the heavy competitors, making it possible to compress and decompress images instantly on your phone, even without a powerful graphics card.

The Real-World Impact

The paper demonstrates this in a satellite communication scenario (like a forest fire response team):

  • Scenario: A ranger sees a fire and needs to send a photo to headquarters. The satellite link is slow and sends data in tiny chunks every 60 seconds.
  • Without ProGIC: The ranger waits 5 minutes for the full image to download. By then, the fire might have spread.
  • With ProGIC:
    • Second 0-60: A blurry, low-res image appears. "I see smoke!"
    • Second 60-120: The image gets clearer. "I see the fire is near the river."
    • Second 120+: The image is sharp. "I see the exact location of the flames."
    • Result: The team can react immediately, even while the image is still "loading."

Summary

ProGIC is a new way to send images that is:

  1. Progressive: You see a rough draft immediately, and it gets better as data arrives (no more "loading..." spinners).
  2. Lightweight: It runs fast on regular phones and laptops, not just supercomputers.
  3. High Quality: It uses AI to "hallucinate" (guess) missing details intelligently, so the image looks great even when the file size is tiny.

It's the difference between waiting for a slow, heavy truck to deliver a package, versus receiving a live video feed that gets clearer the longer you watch.