E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition

The paper proposes E2E-GNet, an end-to-end geometric deep neural network that utilizes a geometric transformation layer and a distortion-aware optimization layer to effectively project skeleton motion sequences from non-Euclidean to linear space, thereby achieving superior human motion recognition performance with lower computational cost across multiple datasets.

Mubarak Olaoluwa, Hassen Drira2026-03-04💻 cs

MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

MUSE is an open-source, run-centric platform that addresses the gap in multimodal safety evaluation by integrating automatic cross-modal payload generation, multi-turn attack algorithms with inter-turn modality switching, and a dual-metric framework to demonstrate that alignment often fails to generalize across audio, image, and video inputs, revealing significantly higher attack success rates than single-turn text-based evaluations suggest.

Zhongxi Wang, Yueqian Lin, Jingyang Zhang + 2 more2026-03-04⚡ eess

Biomechanically Accurate Gait Analysis: A 3d Human Reconstruction Framework for Markerless Estimation of Gait Parameters

This paper introduces a scalable, markerless 3D human reconstruction framework that extracts biomechanically meaningful markers from video to accurately estimate gait parameters, demonstrating strong agreement with reference marker-based data and outperforming conventional pose-estimation methods for clinical and real-world applications.

Akila Pemasiri, Ethan Goan, Glen Lichtwark + 3 more2026-03-04⚡ eess

SGMA: Semantic-Guided Modality-Aware Segmentation for Remote Sensing with Incomplete Multimodal Data

This paper proposes the Semantic-Guided Modality-Aware (SGMA) framework, a novel approach for incomplete multimodal semantic segmentation in remote sensing that utilizes Semantic-Guided Fusion and Modality-Aware Sampling modules to effectively address multimodal imbalance, intra-class variation, and cross-modal heterogeneity, thereby outperforming state-of-the-art methods.

Lekang Wen, Liang Liao, Jing Xiao + 1 more2026-03-04💻 cs

Beyond Anatomy: Explainable ASD Classification from rs-fMRI via Functional Parcellation and Graph Attention Networks

This paper demonstrates that replacing rigid anatomical parcellations with functionally-derived regions of interest within a Graph Attention Network ensemble significantly enhances explainable Autism Spectrum Disorder classification accuracy on rs-fMRI data, achieving state-of-the-art performance while identifying biologically relevant Default Mode Network hubs.

Syeda Hareem Madani, Noureen Bibi, Adam Rafiq Jeraj + 3 more2026-03-04💻 cs

NeighborMAE: Exploiting Spatial Dependencies between Neighboring Earth Observation Images in Masked Autoencoders Pretraining

NeighborMAE is a self-supervised learning framework that enhances Earth Observation image representation by leveraging the spatial dependencies between neighboring images through joint reconstruction and a dynamic heuristic strategy for mask ratios and loss weighting, resulting in superior performance across various downstream tasks compared to existing baselines.

Liang Zeng, Valerio Marsocci, Wufan Zhao + 2 more2026-03-04💻 cs

On Discriminative vs. Generative classifiers: Rethinking MLLMs for Action Understanding

This paper proposes the Generation-Assisted Discriminative (GAD) classifier, a fine-tuning strategy that leverages the efficiency of discriminative classification while utilizing generative modeling to enhance performance, achieving state-of-the-art accuracy and significantly faster inference for closed-set action understanding in Multimodal Large Language Models.

Zhanzhong Pang, Dibyadip Chatterjee, Fadime Sener + 1 more2026-03-04💻 cs

Generalizable Knowledge Distillation from Vision Foundation Models for Semantic Segmentation

This paper proposes Generalizable Knowledge Distillation (GKD), a multi-stage framework that decouples representation learning from task adaptation and employs a query-based soft distillation mechanism to effectively transfer robust, domain-agnostic knowledge from vision foundation models to semantic segmentation tasks, significantly improving out-of-domain generalization compared to conventional methods.

Chonghua Lv, Dong Zhao, Shuang Wang + 4 more2026-03-04💻 cs

CAWM-Mamba: A unified model for infrared-visible image fusion and compound adverse weather restoration

The paper proposes CAWM-Mamba, a unified end-to-end framework that jointly performs infrared-visible image fusion and compound adverse weather restoration using a Weather-Aware Preprocess Module, Cross-modal Feature Interaction Module, and Wavelet Space State Block to outperform existing methods in handling multiple simultaneous degradations while enhancing downstream perception tasks.

Huichun Liu, Xiaosong Li, Zhuangfan Huang + 3 more2026-03-04💻 cs