SMR-Net:Robot Snap Detection Based on Multi-Scale Features and Self-Attention Network

To address the limitations of traditional visual methods in robot automated assembly, this paper proposes SMR-Net, a self-attention-based multi-scale detection algorithm paired with a dedicated sensor, which significantly improves snap localization precision and robustness in complex scenarios by integrating attention-enhanced feature extraction, parallel multi-scale processing, and adaptive reweighting.

Kuanxu Hou2026-03-03💻 cs

SHIELD8-UAV: Sequential 8-bit Hardware Implementation of a Precision-Aware 1D-F-CNN for Low-Energy UAV Acoustic Detection and Temporal Tracking

This paper presents SHIELD8-UAV, a low-energy, sequential 8-bit hardware accelerator for UAV acoustic detection that achieves real-time, precision-aware inference on resource-constrained edge devices through a shared multi-precision datapath, layer-sensitivity quantization, and structured channel pruning.

Susmita Ghanta, Karan Nathwani, Rohit Chaurasiya2026-03-03⚡ eess

Unified Vision-Language Modeling via Concept Space Alignment

This paper introduces V-SONAR, a unified vision-language embedding space aligned with the multilingual SONAR text space, and leverages it to develop V-LCM, a model that achieves state-of-the-art performance in video captioning and significantly outperforms existing vision-language models across 61 diverse languages through concept space alignment and latent diffusion training.

Yifu Qiu, Paul-Ambroise Duquenne, Holger Schwenk2026-03-03💬 cs.CL

Egocentric Co-Pilot: Web-Native Smart-Glasses Agents for Assistive Egocentric AI

This paper introduces Egocentric Co-Pilot, a web-native neuro-symbolic framework for smart glasses that combines an LLM-orchestrated toolset with advanced temporal reasoning and multimodal intent mapping to deliver state-of-the-art, always-on assistive AI for navigation and daily tasks, demonstrating superior performance and user satisfaction over commercial baselines through both cloud and local deployment evaluations.

Sicheng Yang, Yukai Huang, Weitong Cai + 8 more2026-03-03🤖 cs.AI

GroundedSurg: A Multi-Procedure Benchmark for Language-Conditioned Surgical Tool Segmentation

This paper introduces GroundedSurg, the first multi-procedure benchmark designed to evaluate language-conditioned, instance-level surgical tool segmentation by pairing surgical images with natural language descriptions and precise spatial annotations to address the limitations of existing category-level evaluation paradigms in clinical AI.

Tajamul Ashraf, Abrar Ul Riyaz, Wasif Tak + 4 more2026-03-03💻 cs

Teacher-Guided Causal Interventions for Image Denoising: Orthogonal Content-Noise Disentanglement in Vision Transformers

The paper proposes TCD-Net, a Vision Transformer-based image denoising framework that utilizes teacher-guided causal interventions, including environmental bias adjustment and orthogonal content-noise disentanglement, to eliminate spurious correlations and achieve state-of-the-art fidelity and real-time performance.

Kuai Jiang, Zhaoyan Ding, Guijuan Zhang + 2 more2026-03-03💻 cs

TC-SSA: Token Compression via Semantic Slot Aggregation for Gigapixel Pathology Reasoning

This paper proposes TC-SSA, a learnable token compression framework that utilizes gated semantic slot aggregation to efficiently process gigapixel whole slide images by reducing visual tokens to 1.7% of the original sequence while preserving diagnostically critical information and outperforming existing sampling-based methods in both reasoning and classification tasks.

Zhuo Chen, Shawn Young, Lijian Xu2026-03-03🤖 cs.AI