From Fewer Samples to Fewer Bits: Reframing Dataset Distillation as Joint Optimization of Precision and Compactness

This paper introduces QuADD, a unified framework that jointly optimizes dataset compactness and precision through differentiable quantization, demonstrating that balancing sample count and bit allocation significantly enhances information efficiency in dataset distillation.

My H. Dinh, Aditya Sant, Akshay Malhotra, Keya Patani, Shahab Hamidi-Rad

Published 2026-03-04
📖 5 min read🧠 Deep dive

The Big Problem: The "Too Much Data" Traffic Jam

Imagine you are a teacher trying to teach a student (an AI model) how to recognize animals. You have a massive library of 50,000 photos of cats, dogs, and birds.

The Old Way (Dataset Distillation):
Previously, researchers tried to solve the problem of "too much data" by picking a tiny, perfect handful of photos (say, 10 photos per animal) that represented the whole library. They called this Dataset Distillation.

  • Analogy: It's like trying to summarize a 1,000-page novel by picking just 10 sentences. If you pick the right sentences, the student learns the story perfectly. If you pick the wrong ones, they get confused.

The Flaw:
The old method only cared about how many photos you kept. It assumed every photo was a high-definition, 32-bit masterpiece. But in the real world (like on a smartphone or a sensor in a forest), sending high-definition photos takes a lot of bandwidth and storage. It's like trying to send a 4K movie over a dial-up internet connection.

The New Idea: "From Fewer Samples to Fewer Bits"

The authors of this paper, QuADD, say: "Stop worrying just about the number of photos. Let's worry about the total size of the data."

They propose a new way to think about efficiency: The Bit Budget.
Imagine you have a strict limit on how much "digital space" you can use to send your lesson.

  • Old Strategy: Send 10 high-definition photos (Huge size).
  • New Strategy: Send 50 low-resolution, sketch-like photos (Same total size, but more variety).

The paper argues that more variety at lower quality is often better than less variety at high quality.

How It Works: The "Smart Sketch" Factory

To make this work, they built a system called QuADD (Quantization-aware Dataset Distillation). Here is how it works, step-by-step:

1. The "Smart Sketch" (Differentiable Quantization)

Usually, if you take a high-quality photo and shrink it to a sketch (lower precision), you lose details, and the AI gets confused.

  • The Innovation: QuADD doesn't just shrink the photo at the end. It teaches the AI to draw the sketch while it's learning.
  • Analogy: Imagine a chef teaching a student. Instead of giving the student a perfect, expensive steak and then telling them to eat it with a dull knife (which ruins the meal), the chef teaches the student how to cook a delicious meal using a dull knife from the very first lesson. The student learns exactly what ingredients work best with that specific tool.

2. The "Adaptive Palette" (Non-Uniform Quantization)

The system uses a clever trick called Adaptive Non-Uniform Quantization.

  • Analogy: Think of a painter's palette.
    • Uniform (Old Way): The painter uses the same size of paint blobs for everything. A tiny speck of dust gets the same amount of paint as a giant mountain. This wastes paint on the dust and leaves the mountain looking muddy.
    • Adaptive (QuADD Way): The painter looks at the picture. They use tiny, precise dots for the detailed parts (like a cat's whiskers) and big, broad strokes for the simple parts (like the sky).
    • Result: QuADD learns to put the "digital bits" exactly where the information is most important, saving space elsewhere.

3. The "Sweet Spot" Discovery

The researchers tested this by playing a game: "How many photos vs. how much detail?"

  • They found a Sweet Spot: It is often better to have many low-quality samples than a few high-quality ones.
  • Why? Because AI learns better from seeing many different examples (variety) than from seeing the same perfect example a few times. Even if the examples are "grainy," the sheer number of them helps the AI understand the concept better.

The Results: Saving Space Without Losing Smarts

They tested this on two very different things:

  1. Images: Recognizing cats and dogs (CIFAR-10).
  2. Wireless Signals: Helping cell towers find the best signal beam (3GPP data).

The Outcome:

  • Massive Savings: They compressed the data by 10x to 180x (depending on the task).
  • No Loss in Smarts: Despite the data being "grainy" and tiny, the AI models trained on this data performed almost exactly as well as models trained on the massive, high-definition original data.

The Takeaway

This paper changes the goal of AI data compression.

  • Before: "Let's find the fewest number of perfect photos."
  • Now: "Let's find the most efficient way to send information, even if it means sending more 'rough drafts' instead of 'masterpieces'."

It's like realizing that to teach someone a language, you don't need a library of perfect dictionaries; you just need a pocket-sized phrasebook with enough words to get the job done. QuADD gives us that pocket-sized phrasebook for AI.