KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC is a lightweight, model-agnostic transform coder that achieves up to 20 (or higher) compression of Key-Value caches for large language models by combining PCA-based decorrelation, adaptive quantization, and entropy coding, thereby enabling memory-efficient serving with reusable caches while maintaining high reasoning and long-context accuracy.