Composed Vision-Language Retrieval for Skin Cancer Case Search via Joint Alignment of Global and Local Representations
This paper proposes a transformer-based framework for skin cancer case retrieval that effectively combines reference images and textual descriptors by learning hierarchical representations and performing joint global-local alignment, thereby achieving state-of-the-art performance on the Derm7pt dataset to support clinical decision-making.