Synchronizing Probabilities in Model-Driven Lossless Compression

Here is an explanation of the paper "Synchronizing Probabilities in Model-Driven Lossless Compression" using simple language and creative analogies.

The Big Picture: The "Perfect Match" Problem

Imagine you and a friend are playing a game of telephone, but instead of whispering words, you are trying to compress a massive book into a tiny digital file and then send it to your friend to reconstruct perfectly.

To do this efficiently, you both use a super-smart AI (like a large language model) to guess the next word in the book.

You (the Encoder): Look at the sentence so far, ask the AI, "What's the next word likely to be?" The AI says, "There's a 90% chance it's 'cat' and a 10% chance it's 'dog'." You use this guess to shrink the file size.
Your Friend (the Decoder): Receives the tiny file. They also have the exact same AI. They look at the sentence so far and ask, "What's the next word?"

The Catch: For this to work, your AI's guess must be exactly the same as your friend's AI's guess. If you think there is a 90% chance of "cat" and your friend thinks there is a 90.000001% chance, the math breaks. The file gets corrupted, and the rest of the book turns into gibberish.

The Villain: "Digital Static" (Non-Determinism)

In the real world, computers aren't perfect. Even if you and your friend have the exact same AI model, the hardware might be slightly different (one uses an Apple M2 chip, the other an M4). Or maybe the math was done in a slightly different order.

This causes Non-Determinism. It's like two people trying to measure a table with the same ruler, but one ruler is slightly warm and expands a tiny bit. The measurements are almost the same, but not exactly the same.

In standard compression, this tiny difference is fatal. It's like a game of telephone where one person whispers "cat" and the other hears "bat." The whole message collapses.

The Hero: PMATIC (The "Safe Zone" Strategy)

The authors introduce a new method called PMATIC (Probability-Matched Interval Coding). Instead of trying to force the two computers to agree on a perfectly precise number (like 0.9000001), PMATIC says: "Let's just agree on a 'Safe Zone'."

Here is how it works, using a Target Practice analogy:

The Target: Imagine the probability scale (0% to 100%) is a long target board.
The Bins: Instead of aiming for a single pixel, we divide the board into large "bins" or zones.
- Zone A: 0% to 25%
- Zone B: 25% to 50%
- Zone C: 50% to 75%
- Zone D: 75% to 100%
The Strategy:
- You calculate the probability. Let's say you get 49.9%. You are in Zone B.
- Your Friend calculates the probability. Because of "digital static," they get 50.1%. They are in Zone C.
- The Problem: You are in different zones! The file breaks.
- The PMATIC Fix: You send a tiny "helper note" (a helper bit).
  - If you are deep inside a zone (like 40%), you just say, "I'm in Zone B." Your friend, even if they are at 41%, is also in Zone B. You both agree to use the center of Zone B for the math.
  - If you are right on the edge (like 49.9% vs 50.1%), you send a special signal: "Hey, we are near the border! Let's both pretend we are at the exact border line (50%)."

Why This is Genius

It Tolerates Mistakes: Because you are agreeing on a "Safe Zone" or a "Border Line" rather than a microscopic decimal, tiny hardware differences don't matter. You and your friend can be on different computers, and you will still agree on the math.
It's Cheap: Most of the time, the AI is very confident (e.g., 99% chance of "cat"). This means the probability is deep inside a zone, far from the borders. The "helper note" is rarely needed. When it is needed, it's just a tiny bit of extra data.
It Still Compresses Well: Even with these "safe zones," the file size is still much smaller than standard tools like ZIP or GZIP.

The Results: The Race

The authors tested this with super-smart AI models (like Llama 3) on text data (Wikipedia, books, etc.).

The Old Way (Standard Arithmetic Coding): If you run the encoder on a Mac and the decoder on a different Mac, the file corrupts. It fails completely.
The New Way (PMATIC): The file decodes perfectly, even with different hardware.
The Score: PMATIC compressed the text much better than standard tools (like gzip or zstd), even after adding the tiny "safety cost" to handle the hardware differences.

Summary Analogy

Imagine you are trying to meet a friend in a giant city (the data).

Standard Compression: You tell your friend, "Meet me at the exact corner of 5th and Main, at 12:00:00.0000001 PM." If your watch is off by a microsecond, you miss each other, and the plan fails.
PMATIC: You tell your friend, "Meet me in the Coffee Shop on 5th and Main, between 12:00 and 12:05." Even if your watches are slightly off, or your GPS is slightly inaccurate, you will both end up in the same Coffee Shop. You might have to send a tiny text saying, "I'm at the counter," but the plan succeeds, and you still get the job done efficiently.

In short: PMATIC is a clever trick that lets us use powerful, slightly "wobbly" AI models to compress data perfectly, without needing super-expensive, perfectly synchronized computers.

Here is a detailed technical summary of the paper "Synchronizing Probabilities in Model-Driven Lossless Compression" by Aviv Adler and Jennifer Tang, published at ICLR 2026.

1. Problem Statement

The paper addresses a critical bottleneck in model-driven lossless compression, particularly when utilizing Large Language Models (LLMs) or deep neural networks.

Context: Modern compression algorithms often use predictive models (e.g., Transformers) to estimate the probability of the next symbol (token) given the context. These probabilities are then fed into an arithmetic coder to generate the compressed bitstream.
The Challenge: For lossless compression to work, the encoder and decoder must have exactly matching probability predictions. However, modern machine learning inference is often non-deterministic. Due to hardware differences (e.g., GPU architectures like CUDA/cuDNN), floating-point rounding errors, or software versioning, the same model running on different machines can produce slightly different logits (pre-softmax values) for the same input.
Consequence: Even minute differences in predicted probabilities can cause the arithmetic decoder to select the wrong token. This error cascades, corrupting the context for all subsequent tokens and leading to total decoding failure.
Goal: Develop a compression algorithm that is robust to bounded prediction mismatch, allowing the encoder and decoder to agree on a shared probability distribution despite small numerical deviations, without incurring prohibitive compression overhead.

2. Methodology: Probability Matching Interval Coding (PMATIC)

The authors propose PMATIC, a model-agnostic algorithm that acts as a drop-in replacement for standard arithmetic coding. It tolerates bounded mismatch by quantizing probabilities and using "helper bits" to synchronize the encoder and decoder.

Core Mechanism

Token Decomposition: Instead of encoding tokens directly, PMATIC converts each token into a binary string ("longform") based on a fixed dictionary. It then encodes these bits sequentially.
Probability Quantization (Binning):
- The probability interval $[0, 1]$ is divided into equal-width bins of radius $r$ (width $2r$).
- The algorithm assumes the difference between the encoder's probability ( $p$ ) and decoder's probability ( $q$ ) is bounded by a conditional total variation distance $\delta$ (i.e., $|p - q| \leq \delta$ ).
- The bin width is chosen such that $r > 2\delta$ .
Helper Bits: Before encoding the actual token bit, PMATIC encodes a helper bit to inform the decoder which quantization rule to apply:
- Case 1 (Safe Zone): If the encoder's probability falls deep inside a bin (specifically, in the $\delta$ -interior), the encoder sends a helper bit 0. Both encoder and decoder agree to use the center of that bin as the probability for arithmetic coding. Since the mismatch is small, the decoder's probability is guaranteed to be in the same bin.
- Case 2 (Boundary Zone): If the encoder's probability is near a bin boundary (within $\delta$ ), the encoder sends a helper bit 1. Both parties agree to use the boundary point itself as the probability. This ensures agreement even if the decoder's probability has drifted slightly into the adjacent bin.
Arithmetic Coding: The actual token bit is encoded using the agreed-upon quantized probability (either the bin center or the boundary). The helper bits themselves are also compressed using arithmetic coding with a fixed, low-entropy probability ( $\delta/r$ ).

Theoretical Guarantees

Correctness: The paper proves that if the logit difference between encoder and decoder is bounded by $\epsilon$ , and the algorithm uses $\delta = \epsilon/2$ , PMATIC guarantees perfect decoding (Theorem 1).
Cost Analysis: The theoretical overhead consists of two parts:
1. Helper Bit Entropy: The cost of transmitting the helper bits, which is low because the probability of hitting a boundary ( $\approx \delta/r$ ) is small.
2. Quantization Loss: The information loss from using a quantized probability instead of the exact prediction.
- The authors derive an optimal bin radius $r \approx \sqrt{\delta \log(1/\delta)}$ that balances these two costs, resulting in a total overhead of $O(\sqrt{\delta} \log(1/\delta))$ .

3. Key Contributions

Formalization of Mismatch: The paper formally defines the problem of prediction mismatch in model-driven compression and introduces the concept of Conditional Total Variation Distance (CTV) to bound the error.
PMATIC Algorithm: Introduces a novel, model-agnostic coding scheme that guarantees correct decoding under bounded non-determinism without requiring changes to the underlying neural network model.
Theoretical Bounds: Provides rigorous proofs for correctness and derives theoretical bounds on the compression efficiency loss incurred by robustness.
Empirical Validation: Demonstrates that PMATIC works effectively in practice with state-of-the-art LLMs (Llama 3.1, Mistral, Qwen) on diverse text datasets.

4. Experimental Results

The authors tested PMATIC on text datasets (Enwik8, Wikipedia, Shakespeare, Austen, Voltaire, and Chinese literature) using quantized LLMs (4-bit and 3-bit).

Compression Performance:
- PMATIC significantly outperforms traditional compressors (gzip, bzip2, zstd, CMIX).
- Even with robustness settings ( $\delta$ ) high enough to handle real-world hardware discrepancies, PMATIC maintains compression ratios far superior to standard tools.
- Overhead: The "robustness overhead" (the difference between PMATIC and non-robust arithmetic coding) is relatively small. For example, with $\delta=0.01$ , the compression ratio degrades slightly but remains competitive.
Robustness to Real Non-Determinism:
- Synthetic Noise: PMATIC successfully decoded all files when synthetic noise was added to logits within the theoretical bounds.
- Real Hardware: In a test encoding on an Apple M2 Pro and decoding on an Apple M4 Max, standard arithmetic coding failed completely. PMATIC with $\delta=0.01$ successfully decoded all files, confirming it can handle real-world GPU non-determinism.
Helper Bit Efficiency: Experiments showed that the actual frequency of helper bits being set to 1 (indicating a boundary hit) was much lower than the theoretical worst-case assumption (uniform distribution). This suggests PMATIC could be further optimized by better estimating helper bit probabilities.

5. Significance and Future Work

Enabling Distributed LLM Compression: PMATIC solves the "reproducibility" problem in model-driven compression, making it feasible to compress data on one machine and decompress it on another with different hardware, a prerequisite for practical deployment of LLM-based compression tools.
Model Agnostic: The approach does not require retraining models or modifying their architecture; it only changes the encoding/decoding interface.
Future Directions:
- Extending the approach to other domains like images and video.
- Adapting PMATIC for stochastic bounds rather than strict worst-case bounds, as non-determinism often follows a distribution rather than a hard limit.
- Improving helper bit estimation to reduce overhead further.
- Investigating the fundamental information-theoretic limits of compression under mismatch.

In summary, this paper bridges the gap between the theoretical promise of deep learning for compression and the practical realities of non-deterministic hardware, providing a robust, efficient, and theoretically grounded solution.