FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding
FLoC is a training-free, model-agnostic framework that leverages the facility location function and a lazy greedy algorithm to efficiently select a compact, diverse subset of visual tokens for long video understanding, significantly reducing computational costs while maintaining near-optimal performance across diverse benchmarks.