LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings
This paper introduces LEMON, a large-scale endoscopic monocular dataset comprising 938 hours of high-resolution surgical footage, and LemonFM, a foundation model pretrained on this data using self-supervised augmented knowledge distillation that significantly outperforms existing models across multiple surgical perception tasks.