Open-Vocabulary vs Supervised Learning Methods for Post-Disaster Visual Scene Understanding
This paper presents a comparative evaluation of supervised learning and open-vocabulary vision models for post-disaster scene understanding across multiple datasets, concluding that while foundation models offer flexibility, supervised training remains the most reliable approach for accurately detecting small objects and delineating boundaries in cluttered disaster scenes when annotations are available.