Topological Alignment of Shared Vision-Language Embedding Space
This paper introduces ToMCLIP, a topology-aware framework that enhances multilingual vision-language alignment by applying persistent homology to preserve the global geometric structure of shared embedding spaces, thereby improving zero-shot accuracy and retrieval performance compared to existing instance-level methods.