EventVGGT: Exploring Cross-Modal Distillation for Consistent Event-based Depth Estimation
EventVGGT is a novel framework that addresses the scarcity of depth annotations and temporal inconsistency in event-based monocular depth estimation by treating event streams as coherent video sequences and distilling spatio-temporal and multi-view geometric priors from the Visual Geometry Grounded Transformer (VGGT) through a tri-level distillation strategy, achieving state-of-the-art performance and robust zero-shot generalization.