Revisiting Integration of Image and Metadata for DICOM Series Classification: Cross-Attention and Dictionary Learning

This paper proposes a robust end-to-end multimodal framework for DICOM series classification that leverages bi-directional cross-attention and a sparse, missingness-aware dictionary learning encoder to effectively handle heterogeneous image content, variable series lengths, and incomplete metadata without requiring imputation, thereby outperforming existing baselines in both in-domain and out-of-domain settings.

Tuan Truong, Melanie Dohmen, Sara Lorio + 1 more2026-03-02⚡ eess

Polarization Uncertainty-Guided Diffusion Model for Color Polarization Image Demosaicking

This paper proposes a Polarization Uncertainty-Guided Diffusion Model that leverages image diffusion priors and explicitly models polarization uncertainty to accurately reconstruct high-fidelity color polarization images, effectively overcoming the limitations of existing network-based methods in recovering polarization characteristics due to data scarcity.

Chenggong Li, Yidong Luo, Junchao Zhang + 1 more2026-03-02⚡ eess

Open-Vocabulary Semantic Segmentation in Remote Sensing via Hierarchical Attention Masking and Model Composition

This paper introduces ReSeg-CLIP, a training-free open-vocabulary semantic segmentation method for remote sensing that achieves state-of-the-art performance by combining hierarchical attention masking with SAM-generated masks and a novel model composition strategy that averages multiple RS-specific CLIP variants.

Mohammadreza Heidarianbaei, Mareike Dorozynski, Hubert Kanyamahanga + 2 more2026-03-02💻 cs

Bandwidth-adaptive Cloud-Assisted 360-Degree 3D Perception for Autonomous Vehicles

This paper proposes a bandwidth-adaptive, cloud-assisted framework for autonomous vehicles that dynamically splits transformer-based 360-degree 3D perception tasks between the vehicle and the cloud using feature compression and quantization, achieving a 72% latency reduction and up to 20% accuracy improvement over static methods under fluctuating network conditions.

Faisal Hawladera, Rui Meireles, Gamal Elghazaly + 2 more2026-03-02🤖 cs.LG

Experience-Guided Self-Adaptive Cascaded Agents for Breast Cancer Screening and Diagnosis with Reduced Biopsy Referrals

The paper proposes BUSD-Agent, an experience-guided self-adaptive cascaded multi-agent framework for breast ultrasound screening and diagnosis that leverages a memory bank of historical decision trajectories to dynamically adjust escalation thresholds, significantly reducing unnecessary biopsy referrals and improving specificity without requiring model parameter updates.

Pramit Saha, Mohammad Alsharid, Joshua Strong + 1 more2026-03-02🤖 cs.AI

Enhancing Vision-Language Navigation with Multimodal Event Knowledge from Real-World Indoor Tour Videos

This paper proposes STE-VLN, a novel approach that enhances Vision-Language Navigation in unseen environments by constructing the YE-KG, a large-scale multimodal spatiotemporal knowledge graph derived from real-world indoor videos, and integrating it via a Coarse-to-Fine Hierarchical Retrieval mechanism to improve long-horizon reasoning and handle coarse-grained instructions.

Haoxuan Xu, Tianfu Li, Wenbo Chen + 4 more2026-03-02💻 cs

GDA-YOLO11: Amodal Instance Segmentation for Occlusion-Robust Robotic Fruit Harvesting

This paper introduces GDA-YOLO11, a novel amodal instance segmentation framework that significantly enhances occlusion-robust robotic fruit harvesting by inferring complete fruit shapes and accurately estimating picking points, achieving superior performance metrics and higher success rates under varying occlusion levels compared to existing models.

Caner Beldek, Emre Sariyildiz, Son Lam Phung + 1 more2026-03-02💻 cs