SGMA: Semantic-Guided Modality-Aware Segmentation for Remote Sensing with Incomplete Multimodal Data

This paper proposes the Semantic-Guided Modality-Aware (SGMA) framework, a novel approach for incomplete multimodal semantic segmentation in remote sensing that utilizes Semantic-Guided Fusion and Modality-Aware Sampling modules to effectively address multimodal imbalance, intra-class variation, and cross-modal heterogeneity, thereby outperforming state-of-the-art methods.

Lekang Wen, Liang Liao, Jing Xiao + 1 more2026-03-04💻 cs

Beyond Anatomy: Explainable ASD Classification from rs-fMRI via Functional Parcellation and Graph Attention Networks

This paper demonstrates that replacing rigid anatomical parcellations with functionally-derived regions of interest within a Graph Attention Network ensemble significantly enhances explainable Autism Spectrum Disorder classification accuracy on rs-fMRI data, achieving state-of-the-art performance while identifying biologically relevant Default Mode Network hubs.

Syeda Hareem Madani, Noureen Bibi, Adam Rafiq Jeraj + 3 more2026-03-04💻 cs

NeighborMAE: Exploiting Spatial Dependencies between Neighboring Earth Observation Images in Masked Autoencoders Pretraining

NeighborMAE is a self-supervised learning framework that enhances Earth Observation image representation by leveraging the spatial dependencies between neighboring images through joint reconstruction and a dynamic heuristic strategy for mask ratios and loss weighting, resulting in superior performance across various downstream tasks compared to existing baselines.

Liang Zeng, Valerio Marsocci, Wufan Zhao + 2 more2026-03-04💻 cs

On Discriminative vs. Generative classifiers: Rethinking MLLMs for Action Understanding

This paper proposes the Generation-Assisted Discriminative (GAD) classifier, a fine-tuning strategy that leverages the efficiency of discriminative classification while utilizing generative modeling to enhance performance, achieving state-of-the-art accuracy and significantly faster inference for closed-set action understanding in Multimodal Large Language Models.

Zhanzhong Pang, Dibyadip Chatterjee, Fadime Sener + 1 more2026-03-04💻 cs

Generalizable Knowledge Distillation from Vision Foundation Models for Semantic Segmentation

This paper proposes Generalizable Knowledge Distillation (GKD), a multi-stage framework that decouples representation learning from task adaptation and employs a query-based soft distillation mechanism to effectively transfer robust, domain-agnostic knowledge from vision foundation models to semantic segmentation tasks, significantly improving out-of-domain generalization compared to conventional methods.

Chonghua Lv, Dong Zhao, Shuang Wang + 4 more2026-03-04💻 cs

CAWM-Mamba: A unified model for infrared-visible image fusion and compound adverse weather restoration

The paper proposes CAWM-Mamba, a unified end-to-end framework that jointly performs infrared-visible image fusion and compound adverse weather restoration using a Weather-Aware Preprocess Module, Cross-modal Feature Interaction Module, and Wavelet Space State Block to outperform existing methods in handling multiple simultaneous degradations while enhancing downstream perception tasks.

Huichun Liu, Xiaosong Li, Zhuangfan Huang + 3 more2026-03-04💻 cs

Maximizing Generalization: The Effect of Different Augmentation Techniques on Lightweight Vision Transformer for Bengali Character Classification

This study demonstrates that combining Random Affine and Color Jitter augmentation techniques significantly enhances the generalization and accuracy of the lightweight EfficientViT model for Bengali handwritten character recognition on the Ekush and AIBangla datasets, achieving peak accuracies of 97.48% and 97.57% respectively.

Rafi Hassan Chowdhury, Naimul Haque, Kaniz Fatiha2026-03-04💻 cs