Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence
This paper introduces Holi-Spatial, the first fully automated, large-scale, spatially-aware multimodal dataset constructed from raw video streams without human intervention, which provides 4 million high-quality 3D semantic annotations and spatial QA pairs to significantly enhance the training and performance of Vision-Language Models on spatial reasoning tasks.