CAWM-Mamba: A unified model for infrared-visible image fusion and compound adverse weather restoration

The paper proposes CAWM-Mamba, a unified end-to-end framework that jointly performs infrared-visible image fusion and compound adverse weather restoration using a Weather-Aware Preprocess Module, Cross-modal Feature Interaction Module, and Wavelet Space State Block to outperform existing methods in handling multiple simultaneous degradations while enhancing downstream perception tasks.

Huichun Liu, Xiaosong Li, Zhuangfan Huang + 3 more2026-03-04💻 cs

Maximizing Generalization: The Effect of Different Augmentation Techniques on Lightweight Vision Transformer for Bengali Character Classification

This study demonstrates that combining Random Affine and Color Jitter augmentation techniques significantly enhances the generalization and accuracy of the lightweight EfficientViT model for Bengali handwritten character recognition on the Ekush and AIBangla datasets, achieving peak accuracies of 97.48% and 97.57% respectively.

Rafi Hassan Chowdhury, Naimul Haque, Kaniz Fatiha2026-03-04💻 cs

Towards an Incremental Unified Multimodal Anomaly Detection: Augmenting Multimodal Denoising From an Information Bottleneck Perspective

This paper proposes IB-IUMAD, a novel incremental unified multimodal anomaly detection framework that mitigates catastrophic forgetting by leveraging a Mamba decoder to disentangle inter-object feature coupling and an information bottleneck module to filter redundant features, thereby preserving discriminative information across evolving categories.

Kaifang Long, Lianbo Ma, Jiaqi Liu + 2 more2026-03-04💻 cs

Evaluating Cross-Modal Reasoning Ability and Problem Characteristics with Multimodal Item Response Theory

This paper introduces M3IRT, a multimodal item response theory framework that decomposes model ability and item difficulty into image-only, text-only, and cross-modal components to filter out shortcut questions, thereby enabling more reliable and cost-effective evaluation of genuine cross-modal reasoning in Multimodal Large Language Models.

Shunki Uebayashi, Kento Masui, Kyohei Atarashi + 5 more2026-03-04💬 cs.CL