cs.CV papers | Gist.Science

Beyond Prompt Degradation: Prototype-guided Dual-pool Prompting for Incremental Object Detection

This paper proposes PDP, a novel prompt-decoupled framework for Incremental Object Detection that utilizes a dual-pool prompting paradigm to separate task-general and task-specific knowledge while employing a prototypical pseudo-label generation module to mitigate prompt drift, thereby achieving state-of-the-art performance on MS-COCO and PASCAL VOC benchmarks.

Yaoteng Zhang, Zhou Qing, Junyu Gao + 1 more2026-03-04🤖 cs.AI

AutoFFS: Adversarial Deformations for Facial Feminization Surgery Planning

The paper introduces AutoFFS, a novel data-driven framework that utilizes adversarial free-form deformations to generate quantitative, counterfactual skull morphologies for objective and reproducible preoperative planning in Facial Feminization Surgery.

Paul Friedrich, Florentin Bieder, Florian M. Thieringer + 1 more2026-03-04⚡ eess

Loss Design and Architecture Selection for Long-Tailed Multi-Label Chest X-Ray Classification

This paper presents a systematic evaluation of loss functions, architectures, and post-training strategies for long-tailed multi-label chest X-ray classification on the CXR-LT 2026 benchmark, demonstrating that LDAM-DRW combined with a ConvNeXt-Large backbone and classifier re-training achieves a top-5 ranking with 0.3950 mAP while offering practical insights into the development-to-test performance gap.

Nikhileswara Rao Sulake2026-03-04⚡ eess

HAMMER: Harnessing MLLM via Cross-Modal Integration for Intention-Driven 3D Affordance Grounding

HAMMER is a novel framework that leverages multimodal large language models to achieve intention-driven 3D affordance grounding by aggregating interaction intentions into contact-aware embeddings and employing hierarchical cross-modal integration with multi-granular geometry lifting for accurate 3D localization.

Lei Yao, Yong Chen, Yuejiao Su + 3 more2026-03-04💻 cs

Preconditioned Score and Flow Matching

This paper identifies that the ill-conditioned covariance of intermediate distributions in flow matching and score-based diffusion causes optimization bias and stagnation, and proposes reversible preconditioning maps to reshape this geometry, thereby enabling continued progress along suppressed directions and yielding better-trained models.

Shadab Ahamed, Eshed Gal, Simon Ghyselincks + 3 more2026-03-04🤖 cs.AI

MERG3R: A Divide-and-Conquer Approach to Large-Scale Neural Visual Geometry

MERG3R is a training-free, model-agnostic divide-and-conquer framework that enables neural visual geometry models to scale to large, unordered image collections by partitioning data into manageable subsets and merging local reconstructions into a globally consistent 3D model, thereby overcoming GPU memory limitations while improving accuracy and scalability.

Leo Kaixuan Cheng, Abdus Shaikh, Ruofan Liang + 3 more2026-03-04💻 cs

Beyond Caption-Based Queries for Video Moment Retrieval

This paper investigates the performance degradation of existing Video Moment Retrieval methods when transitioning from caption-based to search queries, identifies language and multi-moment gaps alongside a decoder-query collapse as key causes, and proposes architectural modifications to significantly improve generalization on multi-moment search queries.

David Pujol-Perich, Albert Clapés, Dima Damen + 2 more2026-03-04💻 cs

Retrieving Patient-Specific Radiomic Feature Sets for Transparent Knee MRI Assessment

This paper proposes a transparent, patient-specific radiomic framework that employs a two-stage retrieval strategy to select compact, complementary feature sets for knee MRI diagnosis, achieving performance competitive with deep learning models while offering enhanced interpretability through auditable links between specific anatomical regions and clinical outcomes.

Yaxi Chen, Simin Ni, Jingjing Zhang + 7 more2026-03-04💻 cs

Cultural Counterfactuals: Evaluating Cultural Biases in Large Vision-Language Models with Counterfactual Examples

This paper introduces "Cultural Counterfactuals," a high-quality synthetic dataset of nearly 60,000 images created by placing diverse individuals into varied cultural contexts to enable the precise measurement and evaluation of cultural biases related to religion, nationality, and socioeconomic status in Large Vision-Language Models.

Phillip Howard, Xin Su, Kathleen C. Fraser2026-03-04💻 cs

Aligning Fetal Anatomy with Kinematic Tree Log-Euclidean PolyRigid Transforms

This paper introduces a differentiable volumetric body model driven by a novel Kinematic Tree-based Log-Euclidean PolyRigid (KTPolyRigid) transform that resolves deformation ambiguities to achieve smooth, bijective mappings, thereby enabling robust groupwise registration and label-efficient segmentation of fetal anatomy from MRI data.

Yingcheng Liu, Athena Taymourtash, Yang Liu + 5 more2026-03-04💻 cs

Authenticated Contradictions from Desynchronized Provenance and Watermarking

This paper identifies and empirically demonstrates the "Integrity Clash," a vulnerability where digital assets can simultaneously possess valid C2PA provenance claiming human authorship and AI-generated watermarks due to their technical independence, and proposes a cross-layer audit protocol that resolves this contradiction by jointly evaluating both signals to achieve 100% classification accuracy.

Alexander Nemecek, Hengzhi He, Guang Cheng + 1 more2026-03-04⚡ eess

Advancing Earth Observation Through Machine Learning: A TorchGeo Tutorial

This paper introduces a tutorial for the PyTorch-based library TorchGeo that demonstrates its core abstractions and guides users through an end-to-end workflow for training a semantic segmentation model on Sentinel-2 imagery to perform multispectral water segmentation.

Caleb Robinson, Nils Lehmann, Adam J. Stewart + 4 more2026-03-04💻 cs

OpenMarcie: Dataset for Multimodal Action Recognition in Industrial Environments

OpenMarcie is a comprehensive multimodal dataset comprising over 37 hours of data from 36 participants performing industrial assembly tasks, designed to advance human activity recognition in smart factories through diverse sensing modalities and benchmarked across classification, captioning, and alignment tasks.

Hymalai Bello, Lala Ray, Joanna Sorysz + 2 more2026-03-04⚡ eess

From Fewer Samples to Fewer Bits: Reframing Dataset Distillation as Joint Optimization of Precision and Compactness

This paper introduces QuADD, a unified framework that jointly optimizes dataset compactness and precision through differentiable quantization, demonstrating that balancing sample count and bit allocation significantly enhances information efficiency in dataset distillation.

My H. Dinh, Aditya Sant, Akshay Malhotra + 2 more2026-03-04🤖 cs.AI

TruckDrive: Long-Range Autonomous Highway Driving Dataset

The paper introduces TruckDrive, a large-scale multimodal dataset specifically designed for long-range highway autonomous driving with sensors capable of sensing up to 1,000 meters, revealing that current state-of-the-art models fail to generalize beyond 150 meters and highlighting a critical gap in long-range perception capabilities.

Filippo Ghilotti, Edoardo Palladin, Samuel Brucker + 3 more2026-03-04💻 cs

MIRAGE: Knowledge Graph-Guided Cross-Cohort MRI Synthesis for Alzheimer's Disease Prediction

MIRAGE is a novel framework that leverages a Biomedical Knowledge Graph and a frozen 3D U-Net decoder to distill EHR data into a latent diagnostic representation, enabling accurate Alzheimer's disease prediction in cohorts lacking MRI scans without performing computationally expensive 3D voxel reconstruction.

Guanchen Wu, Zhe Huang, Yuzhang Xie + 6 more2026-03-04🤖 cs.AI

ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering

This paper introduces ORCA, a novel multi-agent framework for Document Visual Question Answering that enhances reasoning capabilities by decomposing complex queries, routing them to specialized modality-specific agents, and employing a debate-based adjudication process to ensure reliable and consistent answers, thereby outperforming state-of-the-art methods on multiple benchmarks.

Aymen Lassoued, Mohamed Ali Souibgui, Yousri Kessentini2026-03-04💻 cs

Deep Learning Based Wildfire Detection for Peatland Fires Using Transfer Learning

This paper proposes a transfer learning-based deep learning approach that adapts models pretrained on general wildfire imagery to effectively detect distinct peatland fires using limited labeled data, significantly improving detection accuracy and robustness under challenging conditions like low-contrast smoke and variable illumination.

Emadeldeen Hamdan, Ahmad Faiz Tharima, Mohd Zahirasri Mohd Tohir + 4 more2026-03-04🤖 cs.AI

Large-Scale Dataset and Benchmark for Skin Tone Classification in the Wild

This paper addresses the lack of granular data for skin tone fairness by introducing the large-scale, open-access STW dataset labeled with the 10-tone MST scale, benchmarking deep learning against classic methods, and proposing the SkinToneNet model to achieve state-of-the-art generalization for reliable fairness auditing.

Vitor Pereira Matias, Márcus Vinícius Lobo Costa, João Batista Neto + 1 more2026-03-04🤖 cs.LG

E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition

The paper proposes E2E-GNet, an end-to-end geometric deep neural network that utilizes a geometric transformation layer and a distortion-aware optimization layer to effectively project skeleton motion sequences from non-Euclidean to linear space, thereby achieving superior human motion recognition performance with lower computational cost across multiple datasets.

Mubarak Olaoluwa, Hassen Drira2026-03-04💻 cs

← Previous Next →