OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
This paper introduces OmniSpatial, a comprehensive benchmark grounded in cognitive psychology with over 8.4K annotated samples across four major categories, which reveals significant limitations in current vision-language models' spatial reasoning capabilities and explores strategies like PointGraph and SpatialCoT to address them.