Multimodal Optimal Transport for Unsupervised Temporal Segmentation in Surgical Robotics
This paper introduces TASOT, an unsupervised multimodal optimal transport framework that leverages visual and text-based cues to achieve state-of-the-art surgical phase and step segmentation without relying on costly large-scale pre-training.