Learning Latent Transmission and Glare Maps for Lens Veiling Glare Removal

This paper proposes VeilGen, an unsupervised generative model that learns latent transmission and glare maps to synthesize realistic veiling glare datasets, and DeVeiler, a restoration network that leverages these maps to effectively remove veiling glare from simplified optical systems.

Xiaolong Qian, Qi Jiang, Lei Sun, Zongxi Yu, Kailun Yang, Peixuan Wu, Jiacheng Zhou, Yao Gao, Yaoguang Ma, Ming-Hsuan Yang, Kaiwei Wang2026-03-09🔬 physics.optics

SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis

SyncMV4D is a novel framework that overcomes the limitations of single-view and data-hungry 3D methods by introducing a Multi-view Joint Diffusion model and a Diffusion Points Aligner to simultaneously generate synchronized, realistic multi-view hand-object interaction videos and globally aligned 4D metric motions through a closed-loop coupling of visual appearance and dynamic geometry.

Lingwei Dang, Zonghan Li, Juntong Li, Hongwen Zhang, Liang An, Yebin Liu, Qingyao Wu2026-03-09💻 cs

UniTS: Unified Spatio-Temporal Generative Model for Remote Sensing

This paper introduces UniTS, a unified spatio-temporal generative model based on flow matching and diffusion transformers that integrates tasks like cloud removal, change detection, and forecasting into a single framework, significantly outperforming specialized models under challenging conditions.

Yuxiang Zhang, Shunlin Liang, Wenyuan Li, Han Ma, Jianglei Xu, Yichuan Ma, Jiangwei Xie, Wei Li, Mengmeng Zhang, Ran Tao, Xiang-Gen Xia2026-03-09💻 cs

Exploiting Spatiotemporal Properties for Efficient Event-Driven Human Pose Estimation

This paper proposes a point cloud-based framework for event-driven human pose estimation that leverages spatiotemporal properties through novel temporal slicing and sequencing modules alongside an edge-enhanced representation, achieving improved accuracy and efficiency on the DHP19 dataset without converting event streams into dense frames.

Haoxian Zhou, Chuanzhi Xu, Langyi Chen, Pengfei Ye, Haodong Chen, Yuk Ying Chung, Qiang Qu2026-03-09🤖 cs.AI

DFIR-DETR: Frequency-Domain Iterative Refinement and Dynamic Feature Aggregation for Small Object Detection

DFIR-DETR is a transformer-based small object detector that addresses key limitations in standard architectures by introducing Dynamic Content-Feature Aggregation for adaptive attention, a norm-preserving Dynamic Feature Pyramid Network for detail recovery, and a Frequency-domain Iterative Refinement module to preserve high-frequency boundaries, achieving state-of-the-art performance on NEU-DET and VisDrone benchmarks with high efficiency.

Bo Gao, Jingcheng Tong, Xingsheng Chen, Han Yu, Zichen Li2026-03-09🤖 cs.LG

A Novel Patch-Based TDA Approach for Computed Tomography Imaging

This paper introduces a novel patch-based Topological Data Analysis approach for 3D CT imaging that significantly outperforms traditional 3D cubical complex methods and radiomic features in both classification accuracy and computational efficiency, accompanied by the release of a Python package to facilitate its adoption.

Dashti A. Ali, Aras T. Asaad, Jacob J. Peoples, Mohammad Hamghalam, Natalie Gangai, Richard K. G. Do, Alice C. Wei, Amber L. Simpson2026-03-09🤖 cs.LG

Towards Scalable Pre-training of Visual Tokenizers for Generation

This paper introduces VTP, a unified pre-training framework that optimizes visual tokenizers through joint image-text contrastive, self-supervised, and reconstruction losses to shift the latent space focus from low-level pixel accuracy to high-level semantics, thereby solving the "pre-training scaling problem" and enabling significantly improved, compute-efficient generative performance.

Jingfeng Yao, Yuda Song, Yucong Zhou, Xinggang Wang2026-03-09💻 cs

Spatial4D-Bench: A Versatile 4D Spatial Intelligence Benchmark

This paper introduces Spatial4D-Bench, a large-scale, multi-task benchmark comprising approximately 40,000 question-answer pairs across 18 tasks and six cognitive categories, designed to comprehensively evaluate and reveal the current limitations of Multimodal Large Language Models in achieving human-level 4D spatial intelligence.

Pan Wang, Yang Liu, Guile Wu, Eduardo R. Corral-Soto, Chengjie Huang, Binbin Xu, Dongfeng Bai, Xu Yan, Yuan Ren, Xingxin Chen, Yizhe Wu, Tao Huang, Wenjun Wan, Xin Wu, Pei Zhou, Xuyang Dai, Kangbo Lv, Hongbo Zhang, Yosef Fried, Aixue Ye, Bailan Feng, Zhenyu Chen, Zhen Li, Yingcong Chen, Yiyi Liao, Bingbing Liu2026-03-09💻 cs