Cross-view geo-localization, Image retrieval, Multiscale geometric modeling, Frequency domain enhancement

This paper proposes the Spatial and Frequency Domain Enhancement Network (SFDE), a lightweight three-branch architecture that leverages complementary spatial and frequency domain representations to effectively address geometric asymmetry and texture inconsistencies in cross-view geo-localization, achieving state-of-the-art performance through multiscale structural modeling and frequency invariance.

Hongying Zhang, ShuaiShuai Ma2026-03-04💻 cs

Designing UNICORN: a Unified Benchmark for Imaging in Computational Pathology, Radiology, and Natural Language

The paper introduces UNICORN, a unified public benchmark featuring a standardized two-step evaluation framework and a novel aggregate metric to systematically assess the cross-modality and cross-task generalization of medical foundation models across diverse imaging and natural language data from multiple institutions.

Michelle Stegeman, Lena Philipp, Fennie van der Graaf + 19 more2026-03-04💻 cs

Structure-Aware Text Recognition for Ancient Greek Critical Editions

This paper addresses the limitations of visual language models in recognizing the complex layouts of Ancient Greek critical editions by introducing a large-scale synthetic corpus and a real-world benchmark, demonstrating that while zero-shot performance lags behind traditional tools, fine-tuned models like Qwen3VL-8B can achieve state-of-the-art accuracy.

Nicolas Angleraud, Antonia Karamolegkou, Benoît Sagot + 1 more2026-03-04💻 cs