GroundedSurg: A Multi-Procedure Benchmark for Language-Conditioned Surgical Tool Segmentation

This paper introduces GroundedSurg, the first multi-procedure benchmark designed to evaluate language-conditioned, instance-level surgical tool segmentation by pairing surgical images with natural language descriptions and precise spatial annotations to address the limitations of existing category-level evaluation paradigms in clinical AI.

Tajamul Ashraf, Abrar Ul Riyaz, Wasif Tak + 4 more2026-03-03💻 cs

Teacher-Guided Causal Interventions for Image Denoising: Orthogonal Content-Noise Disentanglement in Vision Transformers

The paper proposes TCD-Net, a Vision Transformer-based image denoising framework that utilizes teacher-guided causal interventions, including environmental bias adjustment and orthogonal content-noise disentanglement, to eliminate spurious correlations and achieve state-of-the-art fidelity and real-time performance.

Kuai Jiang, Zhaoyan Ding, Guijuan Zhang + 2 more2026-03-03💻 cs

TC-SSA: Token Compression via Semantic Slot Aggregation for Gigapixel Pathology Reasoning

This paper proposes TC-SSA, a learnable token compression framework that utilizes gated semantic slot aggregation to efficiently process gigapixel whole slide images by reducing visual tokens to 1.7% of the original sequence while preserving diagnostically critical information and outperforming existing sampling-based methods in both reasoning and classification tasks.

Zhuo Chen, Shawn Young, Lijian Xu2026-03-03🤖 cs.AI

GRAD-Former: Gated Robust Attention-based Differential Transformer for Change Detection

GRAD-Former is a novel, parameter-efficient framework for remote sensing change detection that utilizes a gated robust attention mechanism with Adaptive Feature Relevance and Refinement to overcome the limitations of existing models in handling high-resolution imagery and limited training data, achieving state-of-the-art performance across multiple datasets.

Durgesh Ameta, Ujjwal Mishra, Praful Hambarde + 1 more2026-03-03🤖 cs.AI

AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

This paper presents AgilePruner, an adaptive visual token pruning framework for Large Vision-Language Models that leverages empirical insights into the complementary strengths of attention-based and diversity-based methods to reduce computational overhead while mitigating hallucinations across varying image complexities.

Changwoo Baek, Jouwon Song, Sohyeon Kim + 1 more2026-03-03🤖 cs.LG