Keeping the Evidence Chain: Semantic Evidence Allocation for Training-Free Token Pruning in Video Temporal Grounding

The paper proposes SemVID, a training-free token pruning framework for Video Temporal Grounding that maintains high accuracy and efficiency by allocating token budgets based on query relevance and inter-frame variation while preserving critical evidence and cross-frame connectivity through the strategic selection of object, motion, and context tokens.

Jiaqi Li, Shuntian Zheng, Yixian Shen, Jia-Hong Huang, Xiaoman Lu, Minzhe Ni, Yu Guan2026-03-09💻 cs

Gabor Primitives for Accelerated Cardiac Cine MRI Reconstruction

This paper proposes a cardiac cine MRI reconstruction method using Gabor primitives, which combine Gaussian envelopes with complex exponentials to enable flexible k-space coverage and a low-rank spatiotemporal decomposition, achieving superior performance over compressed sensing, Gaussian primitives, and implicit neural representations while offering physically interpretable parameters.

Wenqi Huang, Veronika Spieker, Nil Stolt-Ansó, Natascha Niessen, Maik Dannecker, Sevgi Gokce Kafali, Sila Kurugol, Julia A. Schnabel, Daniel Rueckert2026-03-09💻 cs

Longitudinal Lesion Inpainting in Brain MRI via 3D Region Aware Diffusion

This paper introduces a novel pseudo-3D longitudinal inpainting framework based on Denoising Diffusion Probabilistic Models and Region-Aware Diffusion that significantly outperforms state-of-the-art baselines in perceptual fidelity, temporal stability, and processing speed for removing evolving lesions from brain MRI scans.

Zahra Karimaghaloo, Dumitru Fetco, Haz-Edine Assemlal, Hassan Rivaz, Douglas L. Arnold2026-03-09🤖 cs.AI

MultiHaystack: Benchmarking Multimodal Retrieval and Reasoning over 40K Images, Videos, and Documents

The paper introduces MultiHaystack, a new benchmark comprising over 46,000 multimodal documents, images, and videos to evaluate the critical gap between retrieval and reasoning in multimodal large language models, revealing that current systems struggle significantly when required to locate evidence within large-scale, heterogeneous corpora rather than being provided with it directly.

Dannong Xu, Zhongyu Yang, Jun Chen, Yingfang Yuan, Ming Hu, Lei Sun, Luc Van Gool, Danda Pani Paudel, Chun-Mei Feng2026-03-09💻 cs

Any to Full: Prompting Depth Anything for Depth Completion in One Stage

Any2Full is a one-stage, domain-general framework that reformulates depth completion as a scale-prompting adaptation of pretrained monocular depth estimation models via a Scale-Aware Prompt Encoder, achieving superior robustness and efficiency by eliminating the computational overhead and distortions of traditional two-stage alignment methods.

Zhiyuan Zhou, Ruofeng Liu, Taichi Liu, Weijian Zuo, Shanshan Wang, Zhiqing Hong, Desheng Zhang2026-03-09💻 cs

Uni-LVC: A Unified Method for Intra- and Inter-Mode Learned Video Compression

Uni-LVC is a unified learned video compression framework that integrates intra and inter coding into a single model by conditioning inter-coding on temporal cues via a cross-attention module and a reliability-aware classifier, thereby achieving superior rate-distortion performance across low-delay and random-access scenarios while maintaining computational efficiency.

Yichi Zhang, Ruoyu Yang, Fengqing Zhu2026-03-09💻 cs

Layer-wise Instance Binding for Regional and Occlusion Control in Text-to-Image Diffusion Transformers

This paper introduces LayerBind, a training-free and plug-and-play method for Diffusion Transformers that achieves precise regional and occlusion control in text-to-image generation by modeling distinct object instances as separate layers during early denoising stages and fusing them through a semantic nursing mechanism.

Ruidong Chen, Yancheng Bai, Xuanpu Zhang, Jianhao Zeng, Lanjun Wang, Dan Song, Lei Sun, Xiangxiang Chu, Anan Liu2026-03-09💻 cs

Spectral Probing of Feature Upsamplers in 2D-to-3D Scene Reconstruction

This paper introduces a spectral diagnostic framework to reveal that preserving spectral structure, rather than merely enhancing spatial details, is the critical factor for achieving high-quality 3D reconstruction in 2D-to-3D pipelines, demonstrating that structural spectral consistency is the strongest predictor of novel view synthesis performance.

Ling Xiao, Yuliang Xiu, Yue Chen, Guoming Wang, Toshihiko Yamasaki2026-03-09💻 cs

Architectural Unification for Polarimetric Imaging Across Multiple Degradations

This paper proposes a unified, single-stage architectural framework that jointly processes image and Stokes domains to achieve state-of-the-art performance in recovering polarimetric parameters from various degraded observations, including low-light noise, motion blur, and mosaicing artifacts, while ensuring physical consistency and avoiding error accumulation.

Chu Zhou, Yufei Han, Junda Liao, Linrui Dai, Wangze Xu, Art Subpa-Asa, Heng Guo, Boxin Shi, Imari Sato2026-03-09💻 cs