MPFlow: Multi-modal Posterior-Guided Flow Matching for Zero-Shot MRI Reconstruction

MPFlow is a zero-shot multi-modal MRI reconstruction framework that leverages a self-supervised pretraining strategy (PAMRI) to guide rectified flow sampling with auxiliary structural scans, thereby significantly reducing hallucinations and improving anatomical fidelity compared to single-modality baselines while requiring fewer sampling steps.

Seunghoi Kim, Chen Jin, Henry F. J. Tregidgo + 2 more2026-03-05🤖 cs.AI

Order Is Not Layout: Order-to-Space Bias in Image Generation

This paper identifies and quantifies "Order-to-Space Bias" (OTS), a systematic flaw in modern image generation models where the textual order of entities incorrectly dictates their spatial layout, and demonstrates that this data-driven issue can be effectively mitigated through targeted fine-tuning and early-stage interventions without compromising generation quality.

Yongkang Zhang, Zonglin Zhao, Yuechen Zhang + 3 more2026-03-05🤖 cs.AI

QD-PCQA: Quality-Aware Domain Adaptation for Point Cloud Quality Assessment

To address the generalization challenges in No-Reference Point Cloud Quality Assessment caused by data scarcity, this paper proposes QD-PCQA, a novel unsupervised domain adaptation framework that transfers quality priors from images to point clouds through a Rank-weighted Conditional Alignment strategy and a Quality-guided Feature Augmentation module to enhance perceptual quality ranking and feature alignment.

Guohua Zhang, Jian Jin, Meiqin Liu + 2 more2026-03-05💻 cs

DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation

DAGE introduces a dual-stream transformer architecture that efficiently estimates accurate, view-consistent geometry and camera poses from uncalibrated multi-view inputs by disentangling global coherence in a low-resolution stream from fine details in a high-resolution stream, achieving state-of-the-art performance while supporting high resolutions and long sequences.

Tuan Duc Ngo, Jiahui Huang, Seoung Wug Oh + 4 more2026-03-05💻 cs

WSI-INR: Implicit Neural Representations for Lesion Segmentation in Whole-Slide Images

This paper proposes WSI-INR, a novel patch-free framework utilizing Implicit Neural Representations and multi-resolution hash grid encoding to model whole-slide images as continuous functions, thereby overcoming the spatial fragmentation and resolution sensitivity of existing methods to achieve robust and accurate lesion segmentation across varying scales.

Yunheng Wu, Wenqi Huang, Liangyi Wang + 4 more2026-03-05💻 cs

Small Object Detection in Complex Backgrounds with Multi-Scale Attention and Global Relation Modeling

This paper proposes a novel framework for small object detection in complex backgrounds that integrates Residual Haar Wavelet Downsampling, Global Relation Modeling, Cross-Scale Hybrid Attention, and a Center-Assisted Loss to preserve fine-grained details, suppress noise, and enhance localization accuracy, achieving state-of-the-art performance on the RGBT-Tiny benchmark.

Wenguang Tao, Xiaotian Wang, Tian Yan + 2 more2026-03-05💻 cs

Adaptive Enhancement and Dual-Pooling Sequential Attention for Lightweight Underwater Object Detection with YOLOv10

This paper proposes a lightweight underwater object detection framework based on YOLOv10 that integrates a Multi-Stage Adaptive Enhancement module, a Dual-Pooling Sequential Attention mechanism, and a Focal Generalized IoU loss to significantly improve accuracy and robustness on benchmark datasets while maintaining a compact model size suitable for resource-constrained environments.

Md. Mushibur Rahman, Umme Fawzia Rahim, Enam Ahmed Taufik2026-03-05💻 cs

From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning

This paper identifies "Lazy Attention Localization" as a key bottleneck in multimodal cold-start training, where models fail to increase visual attention, and proposes the Attention-Guided Visual Anchoring and Reflection (AVAR) framework to effectively reshape attention distributions, achieving a 7.0% performance gain on multimodal reasoning benchmarks.

Ruilin Luo, Chufan Shi, Yizhen Zhang + 10 more2026-03-05🤖 cs.AI