LGQ: Learning Discretization Geometry for Scalable and Stable Image Tokenization

This paper introduces Learnable Geometric Quantization (LGQ), a scalable and stable discrete image tokenizer that end-to-end learns discretization geometry through differentiable soft assignments and specialized regularizers, achieving superior reconstruction fidelity and balanced codebook utilization compared to existing methods like FSQ and SimVQ.

Idil Bilge Altun, Mert Onur Cakiroglu, Elham Buxton, Mehmet Dalkilic, Hasan Kurban

Published 2026-02-23
📖 4 min read☕ Coffee break read

Imagine you are trying to teach a computer to draw pictures, but instead of giving it a blank canvas and a full box of every color in the universe, you have to give it a limited set of "stamps" or "tokens." The computer has to figure out which stamps to use to recreate the image.

This is the challenge of Image Tokenization. The paper introduces a new method called LGQ (Learnable Geometric Quantization) that solves a major headache in this process: how to use these stamps efficiently without the computer getting confused or lazy.

Here is the breakdown using simple analogies:

The Problem: The "Lazy Librarian" vs. The "Rigid Filing Cabinet"

To understand why LGQ is special, we need to look at the two previous ways computers tried to do this:

  1. The Old Way (Vector Quantization / VQ):

    • The Analogy: Imagine a librarian with a massive shelf of 16,000 unique books (the codebook). When a student asks for a book, the librarian looks at the request and picks the single closest book on the shelf.
    • The Flaw: Over time, the librarian gets lazy. They keep picking the same 50 popular books because they are easy to find. The other 15,950 books gather dust and are never used. This is called "Collapse." The system stops learning new things because it's stuck using the same few tools.
  2. The Rigid Way (FSQ / Scalar Quantization):

    • The Analogy: To fix the laziness, someone built a giant filing cabinet with fixed drawers. Every time a request comes in, the librarian must put a file in a specific drawer, no matter what.
    • The Flaw: This ensures every drawer gets used (no laziness), but the drawers are fixed in a rigid grid. If the "files" (the image data) are shaped like a circle, but the drawers are square, the librarian wastes a lot of space trying to fit round files into square boxes. It's efficient in usage but inefficient in shape.

The Solution: LGQ (The "Smart, Adaptable Map")

LGQ is like a librarian who doesn't just pick one book or force a file into a fixed drawer. Instead, they learn to draw a custom map of the library as they go.

Here is how it works:

  • Soft Assignments (The "Warm" Selection):
    Instead of immediately grabbing the one closest book, the librarian initially says, "This request is 60% like Book A, 30% like Book B, and 10% like Book C."

    • Why this helps: This allows the computer to update all those books (A, B, and C) at the same time. It prevents the "lazy librarian" problem because every book gets a little bit of attention during training.
  • Learning the Geometry (The "Shape-Shifting"):
    As the librarian practices, they realize, "Hey, these requests actually look like a circle, not a square!" So, they slowly move the books around on the shelf to match the shape of the requests. They are learning the geometry of the data.

    • The Result: The library layout adapts perfectly to the books people actually want, rather than forcing them into a pre-made grid.
  • The "Straight-Through" Trick:
    During the learning phase, the librarian is flexible (soft). But when it's time to actually send the final order (inference), they snap to a decision and pick the one best book. The magic is that the computer learned how to make that decision by practicing with the flexible, soft method first.

  • The "Popularity" Check (Regularizers):
    The system has two rules to keep things fair:

    1. Be Confident: Don't be too indecisive (don't say 1% for every book). Pick a clear winner.
    2. Be Balanced: Don't let just 50 books get all the work. Make sure the whole shelf gets used, but only the books that are actually needed.

Why Does This Matter?

The paper shows that LGQ is a "Goldilocks" solution:

  • It's not lazy (it uses the codebook efficiently).
  • It's not rigid (it adapts to the shape of the data).
  • The Result: It creates better pictures (lower error rates) while using fewer active stamps than the other methods.

The Big Takeaway:
Previous methods either wasted space (by using too many stamps that didn't fit well) or got stuck using too few stamps (collapsing). LGQ learns the perfect "shape" of the stamp collection for the specific data it's working on. It's like having a set of Lego bricks that can magically reshape themselves to fit the building you are trying to construct, rather than forcing a square peg into a round hole.

In short: LGQ teaches the computer to organize its own vocabulary in the most efficient way possible, leading to sharper images and smarter AI.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →