Spatial Autoregressive Modeling of DINOv3 Embeddings for Unsupervised Anomaly Detection

This paper proposes a memory-efficient unsupervised anomaly detection framework that leverages a 2D autoregressive CNN to explicitly model spatial dependencies in DINOv3 patch embeddings, achieving competitive performance on medical imaging benchmarks while significantly reducing inference time and memory overhead compared to existing prototype-based methods.

Ertunc Erdil, Nico Schulthess, Guney Tombak + 1 more2026-03-04💻 cs

The Dresden Dataset for 4D Reconstruction of Non-Rigid Abdominal Surgical Scenes

The Dresden Dataset (D4D) is a comprehensive benchmark comprising over 300,000 frames and 369 point clouds from porcine cadaver surgeries, providing paired endoscopic video and high-quality structured-light geometry to enable quantitative evaluation of non-rigid 4D reconstruction, SLAM, and depth estimation methods in realistic abdominal surgical scenes.

Reuben Docea, Rayan Younis, Yonghao Long + 10 more2026-03-04💻 cs

MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

This paper proposes MoD-DPO, a Modality-Decoupled Direct Preference Optimization framework that mitigates cross-modal hallucinations in omni-modal LLMs by enforcing modality-specific invariance and sensitivity through regularization and language-prior debiasing, thereby significantly improving perception accuracy and hallucination resistance.

Ashutosh Chaubey, Jiacheng Pang, Mohammad Soleymani2026-03-04💬 cs.CL

ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments

This paper introduces ACE-Brain-0, a generalist foundation brain that leverages spatial intelligence as a universal scaffold and employs a Scaffold-Specialize-Reconcile (SSR) paradigm to unify diverse embodied tasks like autonomous driving and robotics within a single multimodal large language model, achieving state-of-the-art performance across 24 benchmarks.

Ziyang Gong, Zehang Luo, Anke Tang + 21 more2026-03-04💬 cs.CL

COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data -- Generation Stochastic by Design

COP-GEN is a multimodal latent diffusion transformer designed for Earth observation that addresses the inherent non-injectivity of cross-sensor relationships by modeling conditional distributions to generate diverse, physically consistent, and uncertainty-aware realizations across optical, radar, and elevation modalities without task-specific retraining.

Miguel Espinosa, Eva Gmelich Meijling, Valerio Marsocci + 2 more2026-03-04💻 cs