DRBD-Mamba for Robust and Efficient Brain Tumor Segmentation with Analytical Insights

This paper introduces DRBD-Mamba, an efficient 3D brain tumor segmentation model that leverages a dual-resolution bi-directional Mamba architecture with space-filling curves and gated fusion to achieve superior accuracy and robustness across diverse BraTS2023 data partitions while significantly reducing computational overhead compared to existing state-of-the-art methods.

Danish Ali, Ajmal Mian, Naveed Akhtar + 1 more2026-03-06💻 cs

SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes

This paper introduces SceneCOT, a novel framework that achieves grounded question-answering in 3D scenes by decoupling complex reasoning into manageable steps with visual clues, supported by the newly created SCENECOT-185K dataset, which demonstrates state-of-the-art performance and represents the first successful application of Chain-of-Thought reasoning to 3D scene understanding.

Xiongkun Linghu, Jiangyong Huang, Ziyu Zhu + 2 more2026-03-06💻 cs

Observer-Actor: Active Vision Imitation Learning with Sparse-View Gaussian Splatting

The paper introduces Observer-Actor (ObAct), a novel active vision imitation learning framework for dual-arm robots that dynamically assigns one arm to construct a 3D Gaussian Splatting representation and identify optimal viewing angles for the other arm, thereby significantly enhancing policy robustness and performance by reducing occlusions compared to static-camera setups.

Yilong Wang, Cheng Qian, Ruomeng Fan + 1 more2026-03-06💻 cs

STAvatar: Soft Binding and Temporal Density Control for Monocular 3D Head Avatars Reconstruction

STAvatar is a novel framework for monocular 3D head avatar reconstruction that overcomes the limitations of rigid skinning and poor occlusion handling by introducing a UV-Adaptive Soft Binding mechanism and a Temporal Adaptive Density Control strategy to achieve state-of-the-art high-fidelity results with enhanced detail in frequently occluded regions.

Jiankuo Zhao, Xiangyu Zhu, Zidu Wang + 1 more2026-03-06💻 cs

PowerCLIP: Powerset Alignment for Contrastive Pre-Training

PowerCLIP is a novel contrastive pre-training framework that enhances compositional understanding by exhaustively aligning image region powersets with textual parse trees, utilizing efficient non-linear aggregators to overcome the exponential computational cost of naive powerset construction while achieving state-of-the-art performance in zero-shot vision-language tasks.

Masaki Kawamura, Nakamasa Inoue, Rintaro Yanagi + 2 more2026-03-06💻 cs

Fairness-Aware Fine-Tuning of Vision-Language Models for Medical Glaucoma Diagnosis

This paper introduces fairness-aware Low-Rank Adaptation methods, specifically FR-LoRA, GR-LoRA, and Hybrid-LoRA, which utilize a differentiable MaxAccGap loss and inverse frequency weighting to significantly reduce diagnostic accuracy disparities in glaucoma detection across demographic groups while maintaining high overall accuracy with minimal trainable parameters.

Zijian Gu, Yuxi Liu, Zhenhao Zhang + 1 more2026-03-06💻 cs