FunnyNodules: A Customizable Medical Dataset Tailored for Evaluating Explainable AI

The paper introduces FunnyNodules, a fully parameterized synthetic dataset of lung nodule-like shapes with controllable visual attributes and known decision rules, designed to systematically evaluate and benchmark explainable AI models by verifying whether they learn correct attribute-target relations and align their attention with relevant diagnostic features.

Luisa Gallée, Yiheng Xiong, Meinrad Beer, Michael Götz2026-03-09💻 cs

EchoVLA: Synergistic Declarative Memory for VLA-Driven Mobile Manipulation

EchoVLA is a memory-enhanced Vision-Language-Action model for mobile manipulation that synergizes scene and episodic declarative memories to improve navigation and task performance, validated by the new MoMani benchmark and demonstrating significant gains over existing baselines in both simulation and real-world settings.

Min Lin, Xiwen Liang, Bingqian Lin, Liu Jingzhi, Zijian Jiao, Kehan Li, Yu Sun, Weijia Liufu, Yuhan Ma, Yuecheng Liu, Shen Zhao, Yuzheng Zhuang, Xiaodan Liang2026-03-09💻 cs

SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis

SyncMV4D is a novel framework that overcomes the limitations of single-view and data-hungry 3D methods by introducing a Multi-view Joint Diffusion model and a Diffusion Points Aligner to simultaneously generate synchronized, realistic multi-view hand-object interaction videos and globally aligned 4D metric motions through a closed-loop coupling of visual appearance and dynamic geometry.

Lingwei Dang, Zonghan Li, Juntong Li, Hongwen Zhang, Liang An, Yebin Liu, Qingyao Wu2026-03-09💻 cs

UniTS: Unified Spatio-Temporal Generative Model for Remote Sensing

This paper introduces UniTS, a unified spatio-temporal generative model based on flow matching and diffusion transformers that integrates tasks like cloud removal, change detection, and forecasting into a single framework, significantly outperforming specialized models under challenging conditions.

Yuxiang Zhang, Shunlin Liang, Wenyuan Li, Han Ma, Jianglei Xu, Yichuan Ma, Jiangwei Xie, Wei Li, Mengmeng Zhang, Ran Tao, Xiang-Gen Xia2026-03-09💻 cs

Safe Model Predictive Diffusion with Shielding

This paper introduces Safe Model Predictive Diffusion (Safe MPD), a training-free planning framework that integrates a safety shield directly into the diffusion denoising process to generate kinodynamically feasible and safe trajectories in real-time, outperforming existing methods in success rate and safety without requiring post-processing corrections.

Taekyung Kim, Keyvan Majd, Hideki Okamoto, Bardh Hoxha, Dimitra Panagou, Georgios Fainekos2026-03-09💻 cs

UniCoR: Modality Collaboration for Robust Cross-Language Hybrid Code Retrieval

UniCoR is a novel self-supervised framework that addresses the challenges of insufficient semantic understanding, inefficient modality fusion, and weak cross-language generalization in hybrid code retrieval by employing multi-perspective supervised contrastive learning and representation distribution consistency, thereby achieving state-of-the-art performance on both empirical and large-scale benchmarks.

Yang Yang, Li Kuang, Jiakun Liu, Zhongxin Liu, Yingjie Xia, David Lo2026-03-09💻 cs

Towards Scalable Pre-training of Visual Tokenizers for Generation

This paper introduces VTP, a unified pre-training framework that optimizes visual tokenizers through joint image-text contrastive, self-supervised, and reconstruction losses to shift the latent space focus from low-level pixel accuracy to high-level semantics, thereby solving the "pre-training scaling problem" and enabling significantly improved, compute-efficient generative performance.

Jingfeng Yao, Yuda Song, Yucong Zhou, Xinggang Wang2026-03-09💻 cs