cs.CV papers | Gist.Science

Automatic Map Density Selection for Locally-Performant Visual Place Recognition

This paper proposes a dynamic Visual Place Recognition mapping approach that automatically selects the optimal reference map density to guarantee that a user-specified local recall performance level is met across a defined proportion of the environment, thereby ensuring reliable long-term deployment without unnecessary over-densification.

Somayeh Hussaini, Tobias Fischer, Michael Milford2026-03-05💻 cs

Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models

This paper introduces Spatial Credit Redistribution (SCR), a training-free inference-time method that mitigates hallucinations in Vision-Language Models by redistributing suppressed visual attention from dominant patches to their spatial neighbors, thereby significantly reducing hallucination rates across multiple benchmarks while preserving generation quality and maintaining negligible latency.

Niamul Hassan Samin, Md Arifur Rahman, Abdullah Ibne Hanif Arean + 2 more2026-03-05🤖 cs.AI

EvalMVX: A Unified Benchmarking for Neural 3D Reconstruction under Diverse Multiview Setups

This paper introduces EvalMVX, a comprehensive real-world dataset featuring 25 objects with aligned ground-truth meshes captured under diverse lighting and view conditions, to establish a unified benchmark for quantitatively evaluating and comparing neural multiview stereo, photometric stereo, and shape-from-polarization reconstruction methods.

Zaiyan Yang, Jieji Ren, Xiangyi Wang + 5 more2026-03-05💻 cs

Improved MambdaBDA Framework for Robust Building Damage Assessment Across Disaster Domains

This paper proposes an improved MambdaBDA framework that integrates Focal Loss, lightweight Attention Gates, and a compact Alignment Module to significantly enhance building damage assessment accuracy and generalization across diverse disaster domains by addressing class imbalance, background clutter, and domain shift.

Alp Eren Gençoğlu, Hazım Kemal Ekenel2026-03-05💻 cs

A Unified Revisit of Temperature in Classification-Based Knowledge Distillation

This paper presents a unified study that systematically investigates the interactions between the temperature parameter and various training components in knowledge distillation, offering practical guidance for selecting optimal temperature values to improve student performance.

Logan Frank, Jim Davis2026-03-05🤖 cs.LG

ITO: Images and Texts as One via Synergizing Multiple Alignment and Training-Time Fusion

The paper proposes ITO, a framework that enhances image-text contrastive pretraining by synergizing multimodal multiple alignment with a lightweight, inference-free training-time fusion module to eliminate modality gaps and outperform existing baselines across various benchmarks.

HanZpeng Liu, Yaqian Li, Zidan Wang + 6 more2026-03-05🤖 cs.AI

Toward Early Quality Assessment of Text-to-Image Diffusion Models

This paper introduces Probe-Select, a plug-in module that predicts final image quality from early denoising activations to enable efficient early termination of unpromising seeds, thereby reducing sampling costs by over 60% while improving the quality of retained images in text-to-image generation.

Huanlei Guo, Hongxin Wei, Bingyi Jing2026-03-05🤖 cs.LG

Generalized non-exponential Gaussian splatting

This paper generalizes 3D Gaussian splatting to non-exponential radiative transfer regimes by introducing quadratic transmittance variants that achieve rendering quality comparable to the original method while significantly reducing overdraws and accelerating rendering speeds by up to 4x.

Sébastien Speierer, Adrian Jarabo2026-03-05💻 cs

TRACE: Task-Adaptive Reasoning and Representation Learning for Universal Multimodal Retrieval

TRACE introduces a novel framework for universal multimodal retrieval that unifies generative reasoning with discriminative representation learning by generating and compressing Chain-of-Thought traces, thereby achieving state-of-the-art performance through autonomous task-adaptive reasoning and strong zero-shot transferability.

Xiangzhao Hao, Shijie Wang, Tianyu Yang + 3 more2026-03-05💻 cs

MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection

MoECLIP addresses the limitations of patch-agnostic designs in Zero-Shot Anomaly Detection by introducing a Mixture-of-Experts architecture that dynamically routes image patches to specialized LoRA experts, enhanced by Frozen Orthogonal Feature Separation and an ETF loss to ensure distinct and maximally equiangular representations, thereby achieving state-of-the-art performance across diverse industrial and medical benchmarks.

Jun Yeong Park, JunYoung Seo, Minji Kang + 1 more2026-03-05🤖 cs.AI

ProSMA-UNet: Decoder Conditioning for Proximal-Sparse Skip Feature Selection

ProSMA-UNet introduces a novel decoder-conditioned sparse feature selection mechanism that employs multi-scale compatibility fields and an $\ell_1$ proximal operator to explicitly filter irrelevant skip-connection noise, thereby achieving state-of-the-art performance in challenging medical image segmentation tasks.

Chun-Wun Cheng, Yanqi Cheng, Peiyuan Jing + 4 more2026-03-05💻 cs

Specificity-aware reinforcement learning for fine-grained open-world classification

This paper proposes SpeciaRL, a specificity-aware reinforcement learning framework that fine-tunes reasoning Large Multimodal Models to achieve an optimal balance between correctness and specificity in open-world fine-grained image classification by employing a dynamic, verifier-based reward signal.

Samuele Angheben, Davide Berasi, Alessandro Conti + 2 more2026-03-05💻 cs

Deep Sketch-Based 3D Modeling: A Survey

This paper presents a comprehensive survey of Deep Sketch-Based 3D Modeling (DS-3DM) by introducing the novel MORPHEUS design space, which categorizes recent advancements within an Input-Model-Output framework to highlight current limitations and identify future interdisciplinary opportunities for enhancing user-centered, controllable, and information-rich 3D creation.

Alberto Tono, Jiajun Wu, Gordon Wetzstein + 4 more2026-03-05💻 cs

The Influence of Iconicity in Transfer Learning for Sign Language Recognition

This study demonstrates that leveraging the iconicity of signs in transfer learning from Chinese to Arabic and Greek to Flemish significantly improves sign language recognition performance, particularly yielding a 7.02% gain for Arabic, by utilizing MediaPipe-extracted spatial and temporal features processed through MLP and GRU architectures.

Keren Artiaga, Conor Lynch, Haithem Afli + 1 more2026-03-05🤖 cs.AI

mHC-HSI: Clustering-Guided Hyper-Connection Mamba for Hyperspectral Image Classification

This paper introduces mHC-HSI, a clustering-guided Hyper-Connection Mamba model that enhances hyperspectral image classification accuracy and interpretability by integrating spatial-spectral feature learning, soft cluster-based residual matrices, and physically-meaningful spectral band grouping.

Yimin Zhu, Zack Dewis, Quinn Ledingham + 6 more2026-03-05💻 cs

Beyond Accuracy: Evaluating Visual Grounding In Multimodal Medical Reasoning

This paper introduces a counterfactual evaluation framework revealing that while reinforcement learning with verifiable rewards improves accuracy on medical VQA benchmarks, it often degrades genuine visual grounding by enabling models to rely on text shortcuts and hallucinate visual reasoning, necessitating new evaluation metrics and training objectives that explicitly enforce visual dependence.

Anas Zafar, Leema Krishna Murali, Ashish Vashist2026-03-05💻 cs

Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

This paper introduces Proact-VL, a general framework designed to transform multimodal language models into proactive, real-time AI companions that overcome latency and decision-making challenges, validated through the new Live Gaming Benchmark across commentary and guidance scenarios.

Weicai Yan, Yuhong Dai, Qi Ran + 6 more2026-03-05💻 cs

Impact of Localization Errors on Label Quality for Online HD Map Construction

This paper investigates how various localization errors degrade label quality in online HD map construction, revealing that heading angle errors have a more significant impact than position errors and that model performance decreases non-linearly with increasing noise, while also proposing a distance-based metric to better evaluate these effects.

Alexander Blumberg, Jonas Merkert, Richard Fehler + 4 more2026-03-05💻 cs

Beyond Pixel Histories: World Models with Persistent 3D State

The paper introduces PERSIST, a novel world model paradigm that simulates the evolution of a latent 3D scene to overcome the spatial memory and consistency limitations of existing video generation methods, thereby enabling coherent, long-horizon interactive experiences with persistent 3D state and geometry-aware control.

Samuel Garcin, Thomas Walker, Steven McDonagh + 5 more2026-03-05🤖 cs.AI

Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion

This paper introduces Phys4D, a three-stage training pipeline that transforms appearance-driven video diffusion models into physics-consistent 4D world representations by combining pseudo-supervised pretraining, simulation-grounded fine-tuning, and reinforcement learning to achieve fine-grained spatiotemporal and physical consistency.

Haoran Lu, Shang Wu, Jianshu Zhang + 9 more2026-03-05🤖 cs.AI

← Previous Next →