Revisiting Integration of Image and Metadata for DICOM Series Classification: Cross-Attention and Dictionary Learning

This paper proposes a robust end-to-end multimodal framework for DICOM series classification that leverages bi-directional cross-attention and a sparse, missingness-aware dictionary learning encoder to effectively handle heterogeneous image content, variable series lengths, and incomplete metadata without requiring imputation, thereby outperforming existing baselines in both in-domain and out-of-domain settings.

Tuan Truong, Melanie Dohmen, Sara Lorio + 1 more2026-03-02⚡ eess

Polarization Uncertainty-Guided Diffusion Model for Color Polarization Image Demosaicking

This paper proposes a Polarization Uncertainty-Guided Diffusion Model that leverages image diffusion priors and explicitly models polarization uncertainty to accurately reconstruct high-fidelity color polarization images, effectively overcoming the limitations of existing network-based methods in recovering polarization characteristics due to data scarcity.

Chenggong Li, Yidong Luo, Junchao Zhang + 1 more2026-03-02⚡ eess

Open-Vocabulary Semantic Segmentation in Remote Sensing via Hierarchical Attention Masking and Model Composition

This paper introduces ReSeg-CLIP, a training-free open-vocabulary semantic segmentation method for remote sensing that achieves state-of-the-art performance by combining hierarchical attention masking with SAM-generated masks and a novel model composition strategy that averages multiple RS-specific CLIP variants.

Mohammadreza Heidarianbaei, Mareike Dorozynski, Hubert Kanyamahanga + 2 more2026-03-02💻 cs

Bandwidth-adaptive Cloud-Assisted 360-Degree 3D Perception for Autonomous Vehicles

This paper proposes a bandwidth-adaptive, cloud-assisted framework for autonomous vehicles that dynamically splits transformer-based 360-degree 3D perception tasks between the vehicle and the cloud using feature compression and quantization, achieving a 72% latency reduction and up to 20% accuracy improvement over static methods under fluctuating network conditions.

Faisal Hawladera, Rui Meireles, Gamal Elghazaly + 2 more2026-03-02🤖 cs.LG

Experience-Guided Self-Adaptive Cascaded Agents for Breast Cancer Screening and Diagnosis with Reduced Biopsy Referrals

The paper proposes BUSD-Agent, an experience-guided self-adaptive cascaded multi-agent framework for breast ultrasound screening and diagnosis that leverages a memory bank of historical decision trajectories to dynamically adjust escalation thresholds, significantly reducing unnecessary biopsy referrals and improving specificity without requiring model parameter updates.

Pramit Saha, Mohammad Alsharid, Joshua Strong + 1 more2026-03-02🤖 cs.AI

Enhancing Vision-Language Navigation with Multimodal Event Knowledge from Real-World Indoor Tour Videos

This paper proposes STE-VLN, a novel approach that enhances Vision-Language Navigation in unseen environments by constructing the YE-KG, a large-scale multimodal spatiotemporal knowledge graph derived from real-world indoor videos, and integrating it via a Coarse-to-Fine Hierarchical Retrieval mechanism to improve long-horizon reasoning and handle coarse-grained instructions.

Haoxuan Xu, Tianfu Li, Wenbo Chen + 4 more2026-03-02💻 cs