Interpretable Motion-Attentive Maps: Spatio-Temporally Localizing Concepts in Video Diffusion Transformers
This paper introduces GramCol and a motion-feature selection algorithm to generate Interpretable Motion-Attentive Maps (IMAPs) that effectively localize both motion and non-motion concepts in Video Diffusion Transformers without requiring gradient calculations or parameter updates.