MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

This paper proposes MoD-DPO, a Modality-Decoupled Direct Preference Optimization framework that mitigates cross-modal hallucinations in omni-modal LLMs by enforcing modality-specific invariance and sensitivity through regularization and language-prior debiasing, thereby significantly improving perception accuracy and hallucination resistance.

Ashutosh Chaubey, Jiacheng Pang, Mohammad Soleymani2026-03-04💬 cs.CL

ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments

This paper introduces ACE-Brain-0, a generalist foundation brain that leverages spatial intelligence as a universal scaffold and employs a Scaffold-Specialize-Reconcile (SSR) paradigm to unify diverse embodied tasks like autonomous driving and robotics within a single multimodal large language model, achieving state-of-the-art performance across 24 benchmarks.

Ziyang Gong, Zehang Luo, Anke Tang + 21 more2026-03-04💬 cs.CL

COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data -- Generation Stochastic by Design

COP-GEN is a multimodal latent diffusion transformer designed for Earth observation that addresses the inherent non-injectivity of cross-sensor relationships by modeling conditional distributions to generate diverse, physically consistent, and uncertainty-aware realizations across optical, radar, and elevation modalities without task-specific retraining.

Miguel Espinosa, Eva Gmelich Meijling, Valerio Marsocci + 2 more2026-03-04💻 cs