Dataset Distillation via Committee Voting

This paper proposes Committee Voting for Dataset Distillation (CV-DD), a novel approach that leverages the collective knowledge and soft labels from multiple models to generate higher-quality, more robust distilled datasets that outperform existing single- and multi-model methods across various benchmarks and transfer tasks.

Jiacheng Cui, Zhaoyi Li, Xiaochen Ma, Xinyue Bi, Yaxin Luo, Zhiqiang Shen

Published 2026-02-17
📖 4 min read☕ Coffee break read

Imagine you have a massive library containing millions of books (the original dataset). You want to teach a student (an AI model) everything important from this library, but you don't have the time, money, or space to let them read every single book.

Dataset Distillation is the art of creating a tiny, "super-summary" book that contains all the essential knowledge of the library, allowing the student to learn just as well but in a fraction of the time.

However, there's a catch: If you ask just one librarian to write this summary, they might miss important details, focus too much on their favorite topics, or get confused by the sheer volume of information. Their summary might be biased or incomplete.

This is where the paper "Dataset Distillation via Committee Voting" (CV-DD) comes in.

The Core Idea: The "Committee" Approach

Instead of relying on a single librarian, the authors propose hiring a Committee of Experts.

Think of it like a panel of judges on a talent show. If you have only one judge, their personal taste might skew the results. But if you have five judges with different backgrounds (one loves rock, one loves jazz, one is a technical expert, etc.), and you combine their opinions, you get a much fairer, more accurate, and more robust decision.

In this paper:

  1. The Experts: They use several different AI models (like ResNet, MobileNet, DenseNet) as the "committee members." Each model "sees" the data slightly differently.
  2. The Voting: Instead of letting one model dictate the summary, the committee votes on what the "perfect" summary image should look like.
  3. The Smart Weighting: Not all votes are equal. If one expert has a history of being very accurate (high "prior performance"), their vote counts more. If an expert is struggling, their vote counts less. This ensures the best ideas drive the creation of the summary.

The Secret Sauce: Two New Tricks

The authors didn't just bring in a committee; they also fixed two major problems that usually happen when trying to summarize data.

1. The "Ghost" Problem (Batch-Specific Soft Labeling)

Imagine you are trying to teach a student using a summary book. The teacher (the AI) gives the student a "soft label"—a hint like, "This picture is 80% likely a cat, 20% a dog."

Usually, the teacher looks at the real library to give these hints. But the summary book (synthetic data) looks slightly different from the real library. It's like the teacher is wearing glasses that make the summary book look blurry compared to the real thing. This causes the hints to be wrong.

The Fix: The authors invented a trick called Batch-Specific Soft Labeling. Instead of looking at the real library through their glasses, the teacher looks directly at the summary book page they are currently teaching from. They adjust their glasses to match the specific page they are holding. This makes the hints much more accurate, helping the student learn better.

2. The "Smooth" Learning (Smoothed Learning Rate)

When the committee is writing the summary, they are constantly tweaking the images. If they make big, jerky changes, the summary becomes messy. If they move too slowly, it takes forever.

The Fix: They use a "Smoothed Learning Rate." Think of this like a car approaching a stop sign. Instead of slamming the brakes or coasting too slowly, the car gently slows down in a perfect curve. This helps the committee settle on the perfect summary without overshooting or getting stuck in a bad spot.

Why Does This Matter?

  • Less Bias: By listening to a diverse group of models, the summary doesn't lean too heavily on one specific way of seeing the world.
  • Better Generalization: The resulting "summary book" works great even if you use a different student (a different AI model) to read it later.
  • Efficiency: It saves massive amounts of computing power and time. You can train powerful AI models on a tiny dataset that was distilled using this method, rather than needing the whole massive library.

The Bottom Line

The paper says: "Don't ask one person to summarize a million books. Ask a diverse team of experts, let them vote based on who is best at what, and make sure they adjust their teaching style to match the specific page they are working on."

The result is a tiny, high-quality dataset that teaches AI models faster, cheaper, and more accurately than ever before.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →