cs.CV papers | Gist.Science

MERG3R: A Divide-and-Conquer Approach to Large-Scale Neural Visual Geometry

MERG3R is a training-free, model-agnostic divide-and-conquer framework that enables neural visual geometry models to scale to large, unordered image collections by partitioning data into manageable subsets and merging local reconstructions into a globally consistent 3D model, thereby overcoming GPU memory limitations while improving accuracy and scalability.

Leo Kaixuan Cheng, Abdus Shaikh, Ruofan Liang + 3 more2026-03-04💻 cs

Beyond Caption-Based Queries for Video Moment Retrieval

This paper investigates the performance degradation of existing Video Moment Retrieval methods when transitioning from caption-based to search queries, identifies language and multi-moment gaps alongside a decoder-query collapse as key causes, and proposes architectural modifications to significantly improve generalization on multi-moment search queries.

David Pujol-Perich, Albert Clapés, Dima Damen + 2 more2026-03-04💻 cs

Retrieving Patient-Specific Radiomic Feature Sets for Transparent Knee MRI Assessment

This paper proposes a transparent, patient-specific radiomic framework that employs a two-stage retrieval strategy to select compact, complementary feature sets for knee MRI diagnosis, achieving performance competitive with deep learning models while offering enhanced interpretability through auditable links between specific anatomical regions and clinical outcomes.

Yaxi Chen, Simin Ni, Jingjing Zhang + 7 more2026-03-04💻 cs

Cultural Counterfactuals: Evaluating Cultural Biases in Large Vision-Language Models with Counterfactual Examples

This paper introduces "Cultural Counterfactuals," a high-quality synthetic dataset of nearly 60,000 images created by placing diverse individuals into varied cultural contexts to enable the precise measurement and evaluation of cultural biases related to religion, nationality, and socioeconomic status in Large Vision-Language Models.

Phillip Howard, Xin Su, Kathleen C. Fraser2026-03-04💻 cs

Aligning Fetal Anatomy with Kinematic Tree Log-Euclidean PolyRigid Transforms

This paper introduces a differentiable volumetric body model driven by a novel Kinematic Tree-based Log-Euclidean PolyRigid (KTPolyRigid) transform that resolves deformation ambiguities to achieve smooth, bijective mappings, thereby enabling robust groupwise registration and label-efficient segmentation of fetal anatomy from MRI data.

Yingcheng Liu, Athena Taymourtash, Yang Liu + 5 more2026-03-04💻 cs

Authenticated Contradictions from Desynchronized Provenance and Watermarking

This paper identifies and empirically demonstrates the "Integrity Clash," a vulnerability where digital assets can simultaneously possess valid C2PA provenance claiming human authorship and AI-generated watermarks due to their technical independence, and proposes a cross-layer audit protocol that resolves this contradiction by jointly evaluating both signals to achieve 100% classification accuracy.

Alexander Nemecek, Hengzhi He, Guang Cheng + 1 more2026-03-04⚡ eess

Advancing Earth Observation Through Machine Learning: A TorchGeo Tutorial

This paper introduces a tutorial for the PyTorch-based library TorchGeo that demonstrates its core abstractions and guides users through an end-to-end workflow for training a semantic segmentation model on Sentinel-2 imagery to perform multispectral water segmentation.

Caleb Robinson, Nils Lehmann, Adam J. Stewart + 4 more2026-03-04💻 cs

OpenMarcie: Dataset for Multimodal Action Recognition in Industrial Environments

OpenMarcie is a comprehensive multimodal dataset comprising over 37 hours of data from 36 participants performing industrial assembly tasks, designed to advance human activity recognition in smart factories through diverse sensing modalities and benchmarked across classification, captioning, and alignment tasks.

Hymalai Bello, Lala Ray, Joanna Sorysz + 2 more2026-03-04⚡ eess

From Fewer Samples to Fewer Bits: Reframing Dataset Distillation as Joint Optimization of Precision and Compactness

This paper introduces QuADD, a unified framework that jointly optimizes dataset compactness and precision through differentiable quantization, demonstrating that balancing sample count and bit allocation significantly enhances information efficiency in dataset distillation.

My H. Dinh, Aditya Sant, Akshay Malhotra + 2 more2026-03-04🤖 cs.AI

TruckDrive: Long-Range Autonomous Highway Driving Dataset

The paper introduces TruckDrive, a large-scale multimodal dataset specifically designed for long-range highway autonomous driving with sensors capable of sensing up to 1,000 meters, revealing that current state-of-the-art models fail to generalize beyond 150 meters and highlighting a critical gap in long-range perception capabilities.

Filippo Ghilotti, Edoardo Palladin, Samuel Brucker + 3 more2026-03-04💻 cs

MIRAGE: Knowledge Graph-Guided Cross-Cohort MRI Synthesis for Alzheimer's Disease Prediction

MIRAGE is a novel framework that leverages a Biomedical Knowledge Graph and a frozen 3D U-Net decoder to distill EHR data into a latent diagnostic representation, enabling accurate Alzheimer's disease prediction in cohorts lacking MRI scans without performing computationally expensive 3D voxel reconstruction.

Guanchen Wu, Zhe Huang, Yuzhang Xie + 6 more2026-03-04🤖 cs.AI

ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering

This paper introduces ORCA, a novel multi-agent framework for Document Visual Question Answering that enhances reasoning capabilities by decomposing complex queries, routing them to specialized modality-specific agents, and employing a debate-based adjudication process to ensure reliable and consistent answers, thereby outperforming state-of-the-art methods on multiple benchmarks.

Aymen Lassoued, Mohamed Ali Souibgui, Yousri Kessentini2026-03-04💻 cs

Deep Learning Based Wildfire Detection for Peatland Fires Using Transfer Learning

This paper proposes a transfer learning-based deep learning approach that adapts models pretrained on general wildfire imagery to effectively detect distinct peatland fires using limited labeled data, significantly improving detection accuracy and robustness under challenging conditions like low-contrast smoke and variable illumination.

Emadeldeen Hamdan, Ahmad Faiz Tharima, Mohd Zahirasri Mohd Tohir + 4 more2026-03-04🤖 cs.AI

Large-Scale Dataset and Benchmark for Skin Tone Classification in the Wild

This paper addresses the lack of granular data for skin tone fairness by introducing the large-scale, open-access STW dataset labeled with the 10-tone MST scale, benchmarking deep learning against classic methods, and proposing the SkinToneNet model to achieve state-of-the-art generalization for reliable fairness auditing.

Vitor Pereira Matias, Márcus Vinícius Lobo Costa, João Batista Neto + 1 more2026-03-04🤖 cs.LG

E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition

The paper proposes E2E-GNet, an end-to-end geometric deep neural network that utilizes a geometric transformation layer and a distortion-aware optimization layer to effectively project skeleton motion sequences from non-Euclidean to linear space, thereby achieving superior human motion recognition performance with lower computational cost across multiple datasets.

Mubarak Olaoluwa, Hassen Drira2026-03-04💻 cs

ModalPatch: A Plug-and-Play Module for Robust Multi-Modal 3D Object Detection under Modality Drop

ModalPatch is a plug-and-play module that enhances the robustness of multi-modal 3D object detection under arbitrary modality-drop scenarios by leveraging temporal history to predict missing features and employing an uncertainty-guided fusion strategy to ensure reliable compensation without requiring architectural changes or retraining.

Shuangzhi Li, Lei Ma, Xingyu Li2026-03-04💻 cs

MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

MUSE is an open-source, run-centric platform that addresses the gap in multimodal safety evaluation by integrating automatic cross-modal payload generation, multi-turn attack algorithms with inter-turn modality switching, and a dual-metric framework to demonstrate that alignment often fails to generalize across audio, image, and video inputs, revealing significantly higher attack success rates than single-turn text-based evaluations suggest.

Zhongxi Wang, Yueqian Lin, Jingyang Zhang + 2 more2026-03-04⚡ eess

Geometric structures and deviations on James' symmetric positive-definite matrix bicone domain

This paper introduces two new geometric structures on the symmetric positive-definite matrix cone derived from James' bicone reparameterization, which ensure geodesics correspond to straight lines, generalize the Hilbert simplex distance, and offer new tools for analyzing dissimilarities across various scientific disciplines.

Jacek Karwowski, Frank Nielsen2026-03-04📊 stat

WTHaar-Net: a Hybrid Quantum-Classical Approach

This paper introduces WTHaar-Net, a hybrid quantum-classical convolutional neural network that replaces the Hadamard Transform with the spatially localized Haar Wavelet Transform to achieve significant parameter reduction and competitive accuracy on image classification tasks while demonstrating compatibility with near-term quantum hardware.

Vittorio Palladino, Tsai Idden, Ahmet Enis Cetin2026-03-04💻 cs

Biomechanically Accurate Gait Analysis: A 3d Human Reconstruction Framework for Markerless Estimation of Gait Parameters

This paper introduces a scalable, markerless 3D human reconstruction framework that extracts biomechanically meaningful markers from video to accurately estimate gait parameters, demonstrating strong agreement with reference marker-based data and outperforming conventional pose-estimation methods for clinical and real-world applications.

Akila Pemasiri, Ethan Goan, Glen Lichtwark + 3 more2026-03-04⚡ eess

← Previous Next →