VocSegMRI: Multimodal Learning for Precise Vocal Tract Segmentation in Real-time MRI
The paper introduces VocSegMRI, a multimodal framework that leverages cross-attention fusion and contrastive learning to integrate video, audio, and phonological signals, achieving state-of-the-art vocal tract segmentation in real-time MRI with a Dice score of 0.95 and robust performance even when audio is unavailable.