Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
This paper employs mechanistic interpretability to map the internal information flow of VideoLLMs, revealing a consistent three-stage pathway of cross-frame interaction, video-language integration, and answer generation that enables effective temporal reasoning while allowing for significant attention edge pruning without performance loss.