When Visual Evidence is Ambiguous: Pareidolia as a Diagnostic Probe for Vision Models

This paper introduces a diagnostic framework using face pareidolia to reveal that vision models' behavior under visual ambiguity is primarily governed by their representational architecture, with vision-language models exhibiting semantic overactivation, pure vision models adopting uncertainty-based abstention, and detection models relying on conservative priors to suppress false positives.

Qianpu Chen, Derya Soydaner, Rob Saunders2026-03-05🤖 cs.AI

Rethinking the Efficiency and Effectiveness of Reinforcement Learning for Radiology Report Generation

This paper proposes a novel framework for radiology report generation that enhances reinforcement learning efficiency through a diagnostic diversity-based data sampling strategy and a Diagnostic Token-weighted Policy Optimization (DiTPO) method, achieving state-of-the-art clinical accuracy with significantly fewer training samples by prioritizing diagnostically critical content.

Zilin Lu, Ruifeng Yuan, Weiwei Cao + 6 more2026-03-05💻 cs

Volumetric Directional Diffusion: Anchoring Uncertainty Quantification in Anatomical Consensus for Ambiguous Medical Image Segmentation

The paper proposes Volumetric Directional Diffusion (VDD), a novel framework that anchors generative trajectories to a deterministic consensus prior to predict 3D boundary residuals, thereby achieving state-of-the-art anatomically coherent uncertainty quantification for ambiguous medical image segmentation while avoiding the topological fractures common in standard diffusion models.

Chao Wu, Kangxian Xie, Mingchen Gao2026-03-05🤖 cs.AI

DQE-CIR: Distinctive Query Embeddings through Learnable Attribute Weights and Target Relative Negative Sampling in Composed Image Retrieval

The paper proposes DQE-CIR, a novel composed image retrieval method that enhances query discriminativeness and fine-grained retrieval accuracy by integrating learnable attribute weights for precise vision-language alignment and a target relative negative sampling strategy to mitigate relevance suppression and semantic confusion.

Geon Park, Ji-Hoon Park, Seong-Whan Lee2026-03-05🤖 cs.AI

Long-Term Visual Localization in Dynamic Benthic Environments: A Dataset, Footprint-Based Ground Truth, and Visual Place Recognition Benchmark

This paper addresses the lack of benchmarks for long-term visual localization in dynamic benthic environments by introducing a curated multi-year underwater dataset, a novel footprint-based ground-truthing method that outperforms traditional distance-threshold approaches, and a benchmark evaluation demonstrating that state-of-the-art visual place recognition methods struggle significantly in these challenging underwater settings.

Martin Kvisvik Larsen, Oscar Pizarro2026-03-05💻 cs

Revisiting the Role of Foundation Models in Cell-Level Histopathological Image Analysis under Small-Patch Constraints -- Effects of Training Data Scale and Blur Perturbations on CNNs and Vision Transformers

This study demonstrates that for cell-level histopathological image analysis under extreme spatial constraints, task-specific architectures trained on sufficient data outperform foundation models in both accuracy and efficiency, while offering comparable robustness to blur perturbations.

Hiroki Kagiyama, Toru Nagasaka, Yukari Adachi + 5 more2026-03-05💻 cs

Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning

This paper introduces a training-free, capture-time frame curation method for always-on egocentric cameras that leverages gaze stability and pupil-derived novelty as complementary criteria to efficiently select high-quality, informative frames, achieving full-stream classification performance with only 10% of the data while respecting wearable device constraints.

Ajan Subramanian, Sumukh Bettadapura, Rohan Sathish2026-03-05💻 cs

Understanding Sources of Demographic Predictability in Brain MRI via Disentangling Anatomy and Contrast

This paper proposes a disentangled representation learning framework for brain MRI to demonstrate that demographic predictability primarily stems from anatomical variation rather than acquisition-dependent contrast, highlighting the need for targeted mitigation strategies that address these distinct sources to ensure robust bias reduction.

Mehmet Yigit Avci, Akshit Achara, Andrew King + 1 more2026-03-05🤖 cs.AI