Scaling Dense Event-Stream Pretraining from Visual Foundation Models
This paper proposes a novel self-supervised pretraining method that leverages structure-aware distillation from visual foundation models to overcome annotation bottlenecks and semantic collapse, enabling scalable learning of versatile, fine-grained representations from dense event streams.