Word-Anchored Temporal Forgery Localization

This paper introduces Word-Anchored Temporal Forgery Localization (WAFL), a novel paradigm that shifts forgery detection from continuous regression to discrete word-level classification by aligning with linguistic boundaries, employing a forensic feature realignment module for efficient feature mapping, and utilizing an artifact-centric asymmetric loss to overcome class imbalance, thereby achieving superior localization performance with significantly reduced computational costs.

Tianyi Wang, Xi Shao, Harry Cheng, Yinglong Wang, Mohan Kankanhalli2026-03-09💻 cs

Low-latency Event-based Object Detection with Spatially-Sparse Linear Attention

This paper introduces Spatially-Sparse Linear Attention (SSLA) and its application in the SSLA-Det model, an end-to-end asynchronous framework for event-based object detection that achieves state-of-the-art accuracy while significantly reducing per-event computation by leveraging state-level sparsity and parallel training.

Haiqing Hao, Zhipeng Sui, Rong Zou, Zijia Dai, Nikola Zubic, Davide Scaramuzza, Wenhui Wang2026-03-09💻 cs

TaPD: Temporal-adaptive Progressive Distillation for Observation-Adaptive Trajectory Forecasting in Autonomous Driving

TaPD is a unified, plug-and-play framework that employs temporal-adaptive progressive distillation and a temporal backfilling module to enable robust trajectory forecasting under variable and extremely short observation histories by transferring knowledge from long-horizon teachers and reconstructing missing past context.

Mingyu Fan, Yi Liu, Hao Zhou, Deheng Qian, Mohammad Haziq Khan, Matthias Raetsch2026-03-09🤖 cs.AI

GazeMoE: Perception of Gaze Target with Mixture-of-Experts

GazeMoE is a novel end-to-end framework that leverages Mixture-of-Experts modules to adaptively fuse multi-modal cues from a frozen vision foundation model, achieving state-of-the-art performance in human gaze target estimation by addressing class imbalance and enhancing robustness through specialized loss functions and data augmentation.

Zhuangzhuang Dai, Zhongxi Lu, Vincent G. Zakka, Luis J. Manso, Jose M Alcaraz Calero, Chen Li2026-03-09🤖 cs.AI

Spectral and Trajectory Regularization for Diffusion Transformer Super-Resolution

The paper proposes StrSR, a novel one-step adversarial distillation framework that employs asymmetric discriminative distillation and frequency distribution matching to overcome trajectory mismatches and periodic artifacts, thereby achieving state-of-the-art performance in real-world image super-resolution using Diffusion Transformers.

Jingkai Wang, Yixin Tang, Jue Gong, Jiatong Li, Shu Li, Libo Liu, Jianliang Lan, Yutong Liu, Yulun Zhang2026-03-09💻 cs

Can we Trust Unreliable Voxels? Exploring 3D Semantic Occupancy Prediction under Label Noise

This paper introduces OccNL, the first benchmark for 3D semantic occupancy prediction under label noise, and proposes DPR-Occ, a novel framework that leverages dual-source partial label reasoning to achieve robust performance and prevent catastrophic collapse in noisy 3D voxel spaces where existing 2D noise-robust strategies fail.

Wenxin Li, Kunyu Peng, Di Wen, Junwei Zheng, Jiale Wei, Mengfei Duan, Yuheng Zhang, Rui Fan, Kailun Yang2026-03-09💻 cs

Attribute Distribution Modeling and Semantic-Visual Alignment for Generative Zero-shot Learning

This paper proposes ADiVA, a generative zero-shot learning framework that addresses the class-instance and semantic-visual domain gaps by jointly modeling attribute distributions to capture instance-specific variability and employing visual-guided alignment to refine semantic representations, thereby significantly outperforming state-of-the-art methods on benchmark datasets.

Haojie Pu, Zhuoming Li, Yongbiao Gao, Yuheng Jia2026-03-09💻 cs

3D CBCT Artefact Removal Using Perpendicular Score-Based Diffusion Models

This paper proposes a novel 3D dental implant inpainting method using perpendicular score-based diffusion models that operate in the projection domain to capture inter-projection correlations, thereby generating high-quality, artifact-reduced CBCT images with improved consistency compared to existing 2D-based approaches.

Susanne Schaub, Florentin Bieder, Matheus L. Oliveira, Yulan Wang, Dorothea Dagassan-Berndt, Michael M. Bornstein, Philippe C. Cattin2026-03-09🤖 cs.LG

DEX-AR: A Dynamic Explainability Method for Autoregressive Vision-Language Models

The paper introduces DEX-AR, a novel dynamic explainability method that generates per-token and sequence-level heatmaps for autoregressive Vision-Language Models by computing layer-wise gradients and employing dynamic filtering to distinguish visually-grounded from linguistic tokens, thereby improving interpretability and performance across multiple benchmarks.

Walid Bousselham, Angie Boggust, Hendrik Strobelt, Hilde Kuehne2026-03-09🤖 cs.AI