SurgFed: Language-guided Multi-Task Federated Learning for Surgical Video Understanding

The paper proposes SurgFed, a language-guided multi-task federated learning framework that utilizes Language-guided Channel Selection and Language-guided Hyper Aggregation to overcome tissue and task diversity challenges, thereby improving surgical video segmentation and depth estimation across heterogeneous clinical environments.

Zheng Fang, Ziwei Niu, Ziyue Wang, Zhu Zhuo, Haofeng Liu, Shuyang Qian, Jun Xia, Yueming Jin2026-03-11💻 cs

Probing the Reliability of Driving VLMs: From Inconsistent Responses to Grounded Temporal Reasoning

This paper investigates the reliability of Vision-Language Models (VLMs) in autonomous driving by exposing their tendencies toward response inconsistency and weak temporal reasoning, and subsequently proposes the FutureVQA benchmark and a self-supervised chain-of-thought tuning method to enhance grounded future scene reasoning without requiring temporal labels.

Chun-Peng Chang, Chen-Yu Wang, Holger Caesar, Alain Pagani2026-03-11💻 cs

DCAU-Net: Differential Cross Attention and Channel-Spatial Feature Fusion for Medical Image Segmentation

This paper proposes DCAU-Net, a novel medical image segmentation framework that combines Differential Cross Attention to efficiently model long-range dependencies while reducing computational complexity, and a Channel-Spatial Feature Fusion strategy to adaptively integrate semantic and spatial details, thereby achieving enhanced segmentation accuracy and robustness.

Yanxin Li, Hui Wan, Libin Lan2026-03-11💻 cs

Association of Radiologic PPFE Change with Mortality in Lung Cancer Screening Cohorts

This study demonstrates that the longitudinal progression of radiologic pleuroparenchymal fibroelastosis (PPFE), quantified via automated analysis of low-dose CT scans, independently predicts increased mortality and adverse respiratory outcomes in large lung cancer screening cohorts.

Shahab Aslani, Mehran Azimbagirad, Daryl Cheng, Daisuke Yamada, Ryoko Egashira, Adam Szmul, Justine Chan-Fook, Robert Chapman, Alfred Chung Pui So, Shanshan Wang, John McCabe, Tianqi Yang, Jose M Brenes, Eyjolfur Gudmundsson, The SUMMIT Consortium, Susan M. Astley, Daniel C. Alexander, Sam M. Janes, Joseph Jacob2026-03-11🧬 q-bio

A comprehensive study of time-of-flight non-line-of-sight imaging

This paper presents a comprehensive study of Time-of-Flight non-line-of-sight imaging methods by unifying their theoretical formulations and hardware implementations to establish a common framework for analysis and demonstrate that, under equal constraints, existing techniques share similar performance limitations despite method-specific differences.

Julio Marco, Adrian Jarabo, Ji Hyun Nam, Alberto Tosi, Diego Gutierrez, Andreas Velten2026-03-11💻 cs

GeoSolver: Scaling Test-Time Reasoning in Remote Sensing with Fine-Grained Process Supervision

The paper introduces GeoSolver, a framework that enhances remote sensing reasoning by leveraging a large-scale process supervision dataset (Geo-PRM-2M) and a novel Process-Aware Tree-GRPO algorithm to train a token-level reward model (GeoPRM), thereby enabling verifiable, step-by-step reasoning and robust test-time scaling for both specialized and general-purpose Vision-Language Models.

Lang Sun, Ronghao Fu, Zhuoran Duan, Haoran Liu, Xueyan Liu, Bo Yang2026-03-11💻 cs

GeoAlignCLIP: Enhancing Fine-Grained Vision-Language Alignment in Remote Sensing via Multi-Granular Consistency Learning

The paper introduces GeoAlignCLIP, a unified framework that enhances fine-grained vision-language alignment in remote sensing by leveraging multi-granular semantic learning and intra-modal consistency, supported by a newly constructed hierarchical dataset (RSFG-100k) to outperform existing methods on diverse benchmarks.

Xiao Yang, Ronghao Fu, Zhuoran Duan, Zhiwen Lin, Xueyan Liu, Bo Yang2026-03-11💻 cs

More than the Sum: Panorama-Language Models for Adverse Omni-Scenes

This paper introduces the Panorama-Language Modeling (PLM) paradigm and the PanoVQA dataset to enable holistic $360^\circ$ vision-language reasoning in adverse omni-scenes, demonstrating that a unified panoramic approach yields superior understanding compared to stitching multiple narrow-field-of-view inputs.

Weijia Fan, Ruiping Liu, Jiale Wei, Yufan Chen, Junwei Zheng, Zichao Zeng, Jiaming Zhang, Qiufu Li, Linlin Shen, Rainer Stiefelhagen2026-03-11💻 cs

A saccade-inspired approach to image classification using visiontransformer attention maps

This paper proposes a saccade-inspired image classification method that leverages DINO's Vision Transformer attention maps to selectively focus processing on task-relevant regions, achieving performance comparable to or better than full-image analysis while offering a biologically plausible approach to efficient visual processing.

Matthis Dallain, Laurent Rodriguez, Laurent Udo Perrinet, Benoît Miramond2026-03-11💻 cs

OTPL-VIO: Robust Visual-Inertial Odometry with Optimal Transport Line Association and Adaptive Uncertainty

This paper presents OTPL-VIO, a robust stereo visual-inertial odometry system that enhances performance in low-texture and illumination-challenging environments by employing a training-free deep descriptor with entropy-regularized optimal transport for line association and introducing adaptive uncertainty weighting to stabilize estimation.

Zikun Chen, Wentao Zhao, Yihe Niu, Tianchen Deng, Jingchuan Wang2026-03-11💻 cs