Keeping the Evidence Chain: Semantic Evidence Allocation for Training-Free Token Pruning in Video Temporal Grounding
The paper proposes SemVID, a training-free token pruning framework for Video Temporal Grounding that maintains high accuracy and efficiency by allocating token budgets based on query relevance and inter-frame variation while preserving critical evidence and cross-frame connectivity through the strategic selection of object, motion, and context tokens.