CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras

CityGuard is a privacy-preserving, graph-aware transformer framework that enables robust, bias-resilient person re-identification across distributed urban cameras by integrating dispersion-adaptive metric learning, spatially conditioned attention for coarse geometric alignment, and differentially private embeddings to balance retrieval accuracy with data protection.

Rong Fu, Yibo Meng, Jia Yee Tan + 5 more2026-03-06💻 cs

Learning to Drive is a Free Gift: Large-Scale Label-Free Autonomy Pretraining from Unposed In-The-Wild Videos

This paper proposes LFG, a label-free, teacher-guided framework that leverages unposed, in-the-wild ego-centric videos to pretrain a unified pseudo-4D representation for autonomous driving, achieving state-of-the-art planning performance on the NAVSIM benchmark using only a single monocular camera without relying on poses, labels, or LiDAR.

Matthew Strong, Wei-Jer Chang, Quentin Herau + 4 more2026-03-06💻 cs

DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer

DiffusionHarmonizer is an online, single-step generative framework that leverages a custom data curation pipeline to transform imperfect neural reconstruction renderings into temporally consistent, photorealistic simulations, effectively resolving artifacts and harmonizing inserted dynamic objects for autonomous robot development.

Yuxuan Zhang, Katarína Tóthová, Zian Wang + 7 more2026-03-06💻 cs

AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution

This paper proposes AlignVAR, a globally consistent visual autoregressive framework for image super-resolution that overcomes locality bias and error accumulation through Spatial Consistency Autoregression and Hierarchical Consistency Constraint, achieving superior structural coherence and perceptual fidelity with significantly faster inference and fewer parameters than diffusion-based methods.

Cencen Liu, Dongyang Zhang, Wen Yin + 6 more2026-03-06💻 cs

Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving

Dr. Occ is a novel 3D semantic occupancy prediction framework for autonomous driving that leverages a depth-guided view transformer for precise geometric alignment and a region-guided expert transformer to address spatial class imbalance, achieving significant performance improvements over existing vision-only baselines on the Occ3D-nuScenes benchmark.

Xubo Zhu, Haoyang Zhang, Fei He + 4 more2026-03-06💻 cs

Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation

The paper introduces PVT-GDLA, a linear-time decoder architecture featuring Gated Differential Linear Attention that combines noise-canceling kernel paths, adaptive gating, and local token mixing to achieve state-of-the-art, high-fidelity medical image segmentation with superior efficiency compared to existing CNN and Transformer baselines.

Hongbo Zheng, Afshin Bozorgpour, Dorit Merhof + 1 more2026-03-06💻 cs