TimeSpot: Benchmarking Geo-Temporal Understanding in Vision-Language Models in Real-World Settings
This paper introduces TimeSpot, a comprehensive benchmark comprising 1,455 real-world images from 80 countries designed to evaluate the limited geo-temporal reasoning capabilities of current vision-language models in predicting location, time, and environmental context from visual evidence alone.