OmniEarth: A Benchmark for Evaluating Vision-Language Models in Geospatial Tasks
This paper introduces OmniEarth, a comprehensive benchmark comprising 9,275 images and 44,210 verified instructions that evaluates Vision-Language Models across 28 geospatial tasks with a focus on perception, reasoning, and robustness, revealing significant performance gaps in current models for remote sensing applications.