EvoPrune: Early-Stage Visual Token Pruning for Efficient MLLMs
EvoPrune is an early-stage visual token pruning method that performs layer-wise pruning guided by token similarity, diversity, and attention importance during visual encoding, achieving a 2 inference speedup with minimal performance degradation on high-resolution images and videos.