DINOv3 Visual Representations for Blueberry Perception Toward Robotic Harvesting
This paper evaluates DINOv3 as a frozen backbone for blueberry robotic harvesting tasks, finding that while it excels in segmentation through stable patch-level representations, its detection performance is limited by scale variation and spatial aggregation challenges, suggesting it functions best as a semantic backbone requiring downstream spatial modeling tailored to fruit structures.