VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm
VLM-Pruner is a training-free token pruning algorithm that enhances efficient Vision-Language Model inference by introducing a centrifugal selection paradigm and a Buffering for Spatial Sparsity criterion to balance redundancy reduction with spatial coverage, while selectively fusing discarded token information to maintain performance.