VLMQ: Token Saliency-Driven Post-Training Quantization for Vision-language Models
This paper introduces VLMQ, a post-training quantization framework tailored for vision-language models that leverages a gradient-driven importance factor to address visual over-representation and modality gaps, thereby achieving state-of-the-art performance across various model sizes and low-bit settings.