Glass Segmentation with Fusion of Learned and General Visual Features
This paper introduces a novel dual-backbone architecture that fuses general visual features from a frozen DINOv3 model with task-specific features from a supervised Swin model to achieve state-of-the-art glass segmentation performance across multiple datasets while maintaining competitive inference speed.