Beyond Prompt Degradation: Prototype-guided Dual-pool Prompting for Incremental Object Detection

This paper proposes PDP, a novel prompt-decoupled framework for Incremental Object Detection that utilizes a dual-pool prompting paradigm to separate task-general and task-specific knowledge while employing a prototypical pseudo-label generation module to mitigate prompt drift, thereby achieving state-of-the-art performance on MS-COCO and PASCAL VOC benchmarks.

Yaoteng Zhang, Zhou Qing, Junyu Gao + 1 more2026-03-04🤖 cs.AI

Loss Design and Architecture Selection for Long-Tailed Multi-Label Chest X-Ray Classification

This paper presents a systematic evaluation of loss functions, architectures, and post-training strategies for long-tailed multi-label chest X-ray classification on the CXR-LT 2026 benchmark, demonstrating that LDAM-DRW combined with a ConvNeXt-Large backbone and classifier re-training achieves a top-5 ranking with 0.3950 mAP while offering practical insights into the development-to-test performance gap.

Nikhileswara Rao Sulake2026-03-04⚡ eess

MERG3R: A Divide-and-Conquer Approach to Large-Scale Neural Visual Geometry

MERG3R is a training-free, model-agnostic divide-and-conquer framework that enables neural visual geometry models to scale to large, unordered image collections by partitioning data into manageable subsets and merging local reconstructions into a globally consistent 3D model, thereby overcoming GPU memory limitations while improving accuracy and scalability.

Leo Kaixuan Cheng, Abdus Shaikh, Ruofan Liang + 3 more2026-03-04💻 cs

Retrieving Patient-Specific Radiomic Feature Sets for Transparent Knee MRI Assessment

This paper proposes a transparent, patient-specific radiomic framework that employs a two-stage retrieval strategy to select compact, complementary feature sets for knee MRI diagnosis, achieving performance competitive with deep learning models while offering enhanced interpretability through auditable links between specific anatomical regions and clinical outcomes.

Yaxi Chen, Simin Ni, Jingjing Zhang + 7 more2026-03-04💻 cs

Cultural Counterfactuals: Evaluating Cultural Biases in Large Vision-Language Models with Counterfactual Examples

This paper introduces "Cultural Counterfactuals," a high-quality synthetic dataset of nearly 60,000 images created by placing diverse individuals into varied cultural contexts to enable the precise measurement and evaluation of cultural biases related to religion, nationality, and socioeconomic status in Large Vision-Language Models.

Phillip Howard, Xin Su, Kathleen C. Fraser2026-03-04💻 cs

Authenticated Contradictions from Desynchronized Provenance and Watermarking

This paper identifies and empirically demonstrates the "Integrity Clash," a vulnerability where digital assets can simultaneously possess valid C2PA provenance claiming human authorship and AI-generated watermarks due to their technical independence, and proposes a cross-layer audit protocol that resolves this contradiction by jointly evaluating both signals to achieve 100% classification accuracy.

Alexander Nemecek, Hengzhi He, Guang Cheng + 1 more2026-03-04⚡ eess

ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering

This paper introduces ORCA, a novel multi-agent framework for Document Visual Question Answering that enhances reasoning capabilities by decomposing complex queries, routing them to specialized modality-specific agents, and employing a debate-based adjudication process to ensure reliable and consistent answers, thereby outperforming state-of-the-art methods on multiple benchmarks.

Aymen Lassoued, Mohamed Ali Souibgui, Yousri Kessentini2026-03-04💻 cs

Deep Learning Based Wildfire Detection for Peatland Fires Using Transfer Learning

This paper proposes a transfer learning-based deep learning approach that adapts models pretrained on general wildfire imagery to effectively detect distinct peatland fires using limited labeled data, significantly improving detection accuracy and robustness under challenging conditions like low-contrast smoke and variable illumination.

Emadeldeen Hamdan, Ahmad Faiz Tharima, Mohd Zahirasri Mohd Tohir + 4 more2026-03-04🤖 cs.AI

E2E-GNet: An End-to-End Skeleton-based Geometric Deep Neural Network for Human Motion Recognition

The paper proposes E2E-GNet, an end-to-end geometric deep neural network that utilizes a geometric transformation layer and a distortion-aware optimization layer to effectively project skeleton motion sequences from non-Euclidean to linear space, thereby achieving superior human motion recognition performance with lower computational cost across multiple datasets.

Mubarak Olaoluwa, Hassen Drira2026-03-04💻 cs