Dataset Color Quantization: A Training-Oriented Framework for Dataset-Level Compression

Imagine you are trying to teach a robot to recognize cats, dogs, and cars. To do this, you need to show it millions of photos. But here's the problem: those photos are huge. They take up massive amounts of space on your hard drive, and sending them to a small device (like a drone or a smartwatch) is slow and expensive.

Usually, when people try to shrink these photo collections, they take a "scissors" approach: they just throw away 90% of the photos, hoping the remaining 10% are the "best" ones. But this paper says, "Wait a minute! You're throwing away the whole book just because a few pages are too long."

The authors propose a new method called Dataset Color Quantization (DCQ). Instead of throwing away photos, they shrink the colors inside the photos.

Here is how it works, broken down with simple analogies:

1. The Problem: The "Full Color" Overload

Think of a digital photo like a painting made of millions of tiny tiles. In a standard photo, each tile can be one of 16 million colors. That's like having a library with 16 million different paint cans.

The Issue: Most of those colors are redundant. The sky isn't 16 million shades of blue; it's mostly just a few. The grass isn't 16 million shades of green.
The Old Way: Previous methods tried to reduce the library to just 4 paint cans (4 colors) by picking the most popular colors for each painting individually.
- The Flaw: If you do this for every photo separately, the "blue" in Photo A might be slightly different from the "blue" in Photo B. When the robot tries to learn, it gets confused. "Is this blue a sky? Or is it water? Why is the blue different?" It creates a messy, inconsistent learning environment.

2. The Solution: The "Shared Palette" Strategy

The authors' method, DCQ, is like organizing a massive art class where everyone shares a limited set of paint cans, but they share them smartly.

Step A: Grouping by "Vibe" (Chromaticity-Aware Clustering)

Instead of treating every photo as a unique island, DCQ groups photos that look similar.

The Analogy: Imagine sorting a pile of photos into buckets based on their "mood." You put all the "sunny beach" photos in Bucket A, all the "foggy forest" photos in Bucket B, and all the "sunset city" photos in Bucket C.
The Magic: Now, instead of giving every single photo its own unique set of 4 colors, you give Bucket A one shared set of 4 colors, Bucket B another set, and so on. This ensures that the "blue" in one beach photo is exactly the same "blue" in another beach photo. The robot learns much faster because the rules are consistent.

Step B: The "Spotlight" (Attention-Guided Allocation)

Not all parts of a photo are equally important.

The Analogy: Imagine you are looking at a photo of a dog. The dog's face is crucial; the blurry background grass is not.
The Magic: DCQ uses a "spotlight" (an AI attention map) to see where the robot is looking. It says, "Okay, we have 4 colors to use. Let's waste 3 of them on the dog's face because that's what matters. Let's use the 4th color for the background."
The Result: The important parts of the image stay sharp and clear, while the boring parts get simplified. This is like a cartoonist who draws the character's face in high detail but uses simple scribbles for the background.

Step C: Keeping the Edges Sharp (Texture Preservation)

When you reduce colors, things often look blocky or pixelated, like a low-resolution video game.

The Analogy: If you try to draw a circle with only 4 colors, it might look like a jagged square.
The Magic: The authors added a special "polishing" step. They check the edges of the objects (like the outline of a car or a cat's ear) and tweak the colors to make sure the lines stay smooth. It's like using a fine-tipped pen to trace over a rough sketch, ensuring the robot doesn't lose the shape of the object.

Why is this a big deal?

Massive Space Savings: By reducing a photo from 16 million colors to just 4 or 8 colors, you can shrink the file size by 90% or more without deleting a single photo.
Better Learning: Surprisingly, the robot actually learns better with these simplified photos than with the original messy ones. Because the colors are consistent and the important parts are highlighted, the robot focuses on what matters.
Works on Tiny Devices: This means you can train powerful AI models on small devices like drones or medical sensors that don't have huge hard drives or fast internet connections.

The Bottom Line

Think of DCQ not as throwing away the library, but as rewriting the books in a simpler language. You keep all the stories (the data), but you remove the unnecessary adjectives (the redundant colors) and make sure the main characters (the important objects) are described clearly. The result is a library that takes up less space but is actually easier to read and understand.

1. Problem Statement

Deep learning relies on massive image datasets, creating significant storage and transmission burdens, particularly for resource-constrained environments (e.g., edge devices, drones). Existing dataset compression methods primarily focus on dataset pruning (removing samples) or dataset distillation (synthesizing synthetic data). However, these approaches often overlook the intra-sample redundancy inherent in full-color images.

Standard image storage uses 24-bit RGB (8 bits per channel), yet many images contain vast color-space redundancy (e.g., smooth gradients, large background areas). Traditional Color Quantization (CQ) reduces colors to save space but fails in training scenarios for two reasons:

Image-Property-based CQ: Optimizes for human visual perception (e.g., K-Means), leading to ambiguous semantic boundaries and wasted bits on backgrounds.
Model-Perception-based CQ: Optimizes for pre-trained inference accuracy (e.g., ColorCNN), often introducing abrupt texture discontinuities that degrade the learning of new models.

The core challenge is to compress datasets by reducing color redundancy while preserving the structural and semantic information necessary for training high-performance models from scratch.

2. Methodology: Dataset Color Quantization (DCQ)

The authors propose DCQ, a unified framework that treats color quantization as a dataset-level optimization problem rather than an image-level post-processing step. The framework consists of three key stages:

A. Chromaticity-Aware Clustering (CAC)

Instead of quantizing each image independently (which causes palette misalignment across the dataset), DCQ groups images with similar color distributions.

Mechanism: Images are partitioned into $k$ clusters using K-Means on shallow-layer feature maps (extracted from a pre-trained network like ResNet-18).
Rationale: Shallow layers capture low-level color patterns and local textures better than deep layers (which capture abstract semantics). Grouping by color similarity ensures that images within a cluster share a consistent color distribution.
Outcome: A shared cluster-level palette is generated for all images in a cluster, ensuring cross-image consistency and reducing semantic ambiguity.

B. Attention-Guided Palette Allocation

Not all pixels contribute equally to model training. DCQ prioritizes semantically critical regions.

Mechanism: Using Grad-CAM++, the framework generates attention maps to identify discriminative regions (foregrounds) crucial for classification.
Allocation: Pixels in high-attention regions are preserved with higher priority during palette generation. The RGB values of these critical pixels are converted to LAB color space (for better perceptual uniformity) and aggregated to form an expanded palette space.
Outcome: The resulting shared palette allocates more "bits" (color representatives) to foreground objects and critical structures, while compressing background regions more aggressively.

C. Texture-Preserved Palette Optimization

Standard K-Means clustering ignores structural details, leading to texture degradation.

Mechanism: DCQ employs differentiable quantization with a Straight-Through Estimator (STE) to enable backpropagation through the quantization process.
Loss Function: The palette is refined by minimizing the Edge Distribution Loss ( $E_L$ ), which calculates the Mean Squared Error (MSE) between the Sobel-filtered edge maps of the original and quantized images.
Outcome: This optimization step ensures that edges, gradients, and fine textures are preserved, preventing the "blocky" artifacts common in traditional quantization.

Pipeline:

Cluster images based on shallow features.
Generate a shared palette per cluster using attention-guided aggregation.
Refine the palette via differentiable optimization to minimize edge loss.
Store the palette and indices; reconstruct images during training using these quantized representations.

3. Key Contributions

Novel Framework: First work to propose a dataset-level color quantization solution using a limited set of shared palettes to reduce storage and enable training on color-restricted devices.
Algorithmic Innovation: Introduces a three-pronged approach combining cluster-shared palettes (for consistency), attention-guided bit allocation (for semantic importance), and differentiable edge preservation (for structural fidelity).
Training-Oriented Optimization: Unlike prior works that optimize for human vision or pre-trained inference, DCQ explicitly optimizes for the training performance of models trained on the quantized data.

4. Experimental Results

The authors evaluated DCQ on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet-1K using ResNet architectures.

Performance vs. Traditional CQ: DCQ significantly outperforms state-of-the-art color quantization methods (ColorCNN, CQFormer, MedianCut, OCTree).
- Example: On CIFAR-10 with 2-bit quantization (4 colors), DCQ achieves 89.15% accuracy, compared to ~59% for ColorCNN and ~77% for MedianCut.
- Improvement: Gains of ~30% over ColorCNN on CIFAR-10 and ~35% on CIFAR-100 under aggressive compression.
Performance vs. Dataset Pruning: DCQ outperforms dataset pruning methods (EL2N, Entropy, CCS, TDDS) even at extreme compression ratios.
- Example: At a 96% compression ratio (1-bit quantization, 2 colors), DCQ achieves 79.9% on CIFAR-10, significantly beating the best pruning baseline (CCS) at 73.02%.
Generalization: The method is robust across different network architectures (ResNet-34/50, ShuffleNet, MobileNet-v2, ViT, Swin Transformer) and tasks (segmentation, detection, generative modeling).
Ablation Studies:
- Clustering: 20 clusters yielded the best balance between consistency and fidelity.
- Features: Shallow-layer features outperformed deep features and raw pixels.
- Edge Loss: Incorporating edge distribution minimization improved accuracy by ~3% at 1-bit quantization.

5. Significance and Impact

Storage Efficiency: DCQ offers a scalable solution for reducing dataset storage requirements without discarding data samples, making it ideal for edge deployment and bandwidth-constrained transmission.
Training Efficiency: Unlike dataset pruning, which often requires compensatory training epochs to converge, DCQ maintains training efficiency while drastically reducing the data footprint.
New Paradigm: The paper shifts the focus from "selecting the best images" to "optimizing the representation of all images," demonstrating that color-space redundancy is a critical, untapped resource for dataset compression.
Future Directions: The work opens avenues for developing neural architectures specifically designed for color-quantized inputs and adaptive per-image quantization strategies.

In summary, DCQ successfully bridges the gap between image compression and deep learning training, proving that aggressive color quantization can be a viable, high-performance strategy for dataset-level compression when guided by model perception and structural preservation.