MERG3R: A Divide-and-Conquer Approach to Large-Scale Neural Visual Geometry

MERG3R is a training-free, model-agnostic divide-and-conquer framework that enables neural visual geometry models to scale to large, unordered image collections by partitioning data into manageable subsets and merging local reconstructions into a globally consistent 3D model, thereby overcoming GPU memory limitations while improving accuracy and scalability.

Leo Kaixuan Cheng, Abdus Shaikh, Ruofan Liang + 3 more2026-03-04💻 cs

Retrieving Patient-Specific Radiomic Feature Sets for Transparent Knee MRI Assessment

This paper proposes a transparent, patient-specific radiomic framework that employs a two-stage retrieval strategy to select compact, complementary feature sets for knee MRI diagnosis, achieving performance competitive with deep learning models while offering enhanced interpretability through auditable links between specific anatomical regions and clinical outcomes.

Yaxi Chen, Simin Ni, Jingjing Zhang + 7 more2026-03-04💻 cs

Cultural Counterfactuals: Evaluating Cultural Biases in Large Vision-Language Models with Counterfactual Examples

This paper introduces "Cultural Counterfactuals," a high-quality synthetic dataset of nearly 60,000 images created by placing diverse individuals into varied cultural contexts to enable the precise measurement and evaluation of cultural biases related to religion, nationality, and socioeconomic status in Large Vision-Language Models.

Phillip Howard, Xin Su, Kathleen C. Fraser2026-03-04💻 cs

Authenticated Contradictions from Desynchronized Provenance and Watermarking

This paper identifies and empirically demonstrates the "Integrity Clash," a vulnerability where digital assets can simultaneously possess valid C2PA provenance claiming human authorship and AI-generated watermarks due to their technical independence, and proposes a cross-layer audit protocol that resolves this contradiction by jointly evaluating both signals to achieve 100% classification accuracy.

Alexander Nemecek, Hengzhi He, Guang Cheng + 1 more2026-03-04⚡ eess

ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering

This paper introduces ORCA, a novel multi-agent framework for Document Visual Question Answering that enhances reasoning capabilities by decomposing complex queries, routing them to specialized modality-specific agents, and employing a debate-based adjudication process to ensure reliable and consistent answers, thereby outperforming state-of-the-art methods on multiple benchmarks.

Aymen Lassoued, Mohamed Ali Souibgui, Yousri Kessentini2026-03-04💻 cs

Deep Learning Based Wildfire Detection for Peatland Fires Using Transfer Learning

This paper proposes a transfer learning-based deep learning approach that adapts models pretrained on general wildfire imagery to effectively detect distinct peatland fires using limited labeled data, significantly improving detection accuracy and robustness under challenging conditions like low-contrast smoke and variable illumination.

Emadeldeen Hamdan, Ahmad Faiz Tharima, Mohd Zahirasri Mohd Tohir + 4 more2026-03-04🤖 cs.AI

E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition

The paper proposes E2E-GNet, an end-to-end geometric deep neural network that utilizes a geometric transformation layer and a distortion-aware optimization layer to effectively project skeleton motion sequences from non-Euclidean to linear space, thereby achieving superior human motion recognition performance with lower computational cost across multiple datasets.

Mubarak Olaoluwa, Hassen Drira2026-03-04💻 cs

MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

MUSE is an open-source, run-centric platform that addresses the gap in multimodal safety evaluation by integrating automatic cross-modal payload generation, multi-turn attack algorithms with inter-turn modality switching, and a dual-metric framework to demonstrate that alignment often fails to generalize across audio, image, and video inputs, revealing significantly higher attack success rates than single-turn text-based evaluations suggest.

Zhongxi Wang, Yueqian Lin, Jingyang Zhang + 2 more2026-03-04⚡ eess

Biomechanically Accurate Gait Analysis: A 3d Human Reconstruction Framework for Markerless Estimation of Gait Parameters

This paper introduces a scalable, markerless 3D human reconstruction framework that extracts biomechanically meaningful markers from video to accurately estimate gait parameters, demonstrating strong agreement with reference marker-based data and outperforming conventional pose-estimation methods for clinical and real-world applications.

Akila Pemasiri, Ethan Goan, Glen Lichtwark + 3 more2026-03-04⚡ eess