HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit
HiDrop is a novel framework that significantly accelerates Multimodal Large Language Models (MLLMs) by aligning token pruning with hierarchical layer functions through Late Injection, Concave Pyramid Pruning, and Early Exit mechanisms, achieving a 90% reduction in visual tokens with a 1.72x training speedup while maintaining original performance.