Revisiting Multimodal KV Cache Compression: A Frequency-Domain-Guided Outlier-KV-Aware Approach
This paper introduces FlashCache, a frequency-domain-guided KV cache compression framework that identifies and preserves critical "Outlier KVs" while leveraging low-pass filtering and dynamic budget allocation to achieve significant inference speedups and memory reduction in multimodal large language models without compromising performance.