Interactive World Simulator for Robot Policy Training and Evaluation

This paper presents the Interactive World Simulator, a fast and physically consistent framework leveraging consistency models to generate high-fidelity long-horizon video predictions that enable scalable robot policy training and reliable real-world evaluation using solely simulated data.

Yixuan Wang, Rhythm Syed, Fangyu Wu, Mengchao Zhang, Aykut Onol, Jose Barreiros, Hooshang Nayyeri, Tony Dear, Huan Zhang, Yunzhu LiTue, 10 Ma🤖 cs.LG

vS-Graphs: Tightly Coupling Visual SLAM and 3D Scene Graphs Exploiting Hierarchical Scene Understanding

This paper introduces vS-Graphs, a real-time Visual SLAM framework that tightly couples vision-based scene understanding with 3D scene graphs to generate semantically rich, hierarchical maps, achieving a 15.22% accuracy improvement over state-of-the-art methods while matching LiDAR-based semantic detection performance using only visual features.

Ali Tourani, Saad Ejaz, Hriday Bavle, Miguel Fernandez-Cortizas, David Morilla-Cabello, Jose Luis Sanchez-Lopez, Holger VoosThu, 12 Ma💻 cs

Score Matching Diffusion Based Feedback Control and Planning of Nonlinear Systems

This paper proposes a deterministic diffusion-based framework for controlling the probability density of nonlinear control-affine systems by leveraging a forward noise-excitation process and a reverse denoising feedback law to steer state distributions toward desired targets, with theoretical guarantees for drift-free and linear time-invariant dynamics.

Karthik Elamvazhuthi, Darshan Gadginmath, Fabio PasqualettiThu, 12 Ma⚡ eess

Scalable Multi-Task Learning through Spiking Neural Networks with Adaptive Task-Switching Policy for Intelligent Autonomous Agents

The paper proposes SwitchMT, a novel methodology for scalable multi-task learning in resource-constrained autonomous agents that combines a Deep Spiking Q-Network with active dendrites and an adaptive task-switching policy to effectively mitigate task interference and outperform state-of-the-art methods in Atari games.

Rachmad Vidya Wicaksana Putra, Avaneesh Devkota, Muhammad ShafiqueThu, 12 Ma🤖 cs.AI

Safe and Optimal Learning from Preferences via Weighted Temporal Logic with Applications in Robotics and Formula 1

This paper proposes a safety-guaranteed and optimal learning framework for autonomous systems that utilizes Weighted Signal Temporal Logic (WSTL) with structural pruning and log-transform techniques to efficiently solve preference-based learning problems as Mixed-Integer Linear Programs, validated through experiments in robotic navigation and Formula 1 racing.

Ruya Karagulle, Cristian-Ioan Vasile, Necmiye OzayThu, 12 Ma⚡ eess

MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent

MergeVLA is a merging-oriented Vision-Language-Action architecture that overcomes the non-mergeability of existing VLA experts by introducing task-masked sparse LoRA adapters and cross-attention-only action experts, enabling a single generalist agent to robustly handle diverse tasks and embodiments without performance degradation.

Yuxia Fu, Zhizhen Zhang, Yuqi Zhang, Zijian Wang, Zi Huang, Yadan LuoThu, 12 Ma💻 cs

CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents

This paper introduces CostNav, the first physics-grounded navigation benchmark that evaluates autonomous agents using real-world economic data to reveal that current methods, despite varying in hardware and architecture, all fail to achieve economic viability due to negative contribution margins.

Haebin Seong, Sungmin Kim, Yongjun Cho, Myunchul Joe, Geunwoo Kim, Yubeen Park, Sunhoo Kim, Yoonshik Kim, Suhwan Choi, Jaeyoon Jung, Jiyong Youn, Jinmyung Kwak, Sunghee Ahn, Jaemin Lee, Younggil Do, Seungyeop Yi, Woojin Cheong, Minhyeok Oh, Minchan Kim, Seongjae Kang, Samwoo Seong, Youngjae Yu, Yunsung LeeThu, 12 Ma🤖 cs.AI

Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments

This paper introduces Partially Equivariant Reinforcement Learning, a framework that mitigates error propagation in symmetry-breaking environments by selectively applying group-invariant or standard Bellman backups based on local symmetry, thereby achieving superior sample efficiency and generalization compared to existing methods.

Junwoo Chang, Minwoo Park, Joohwan Seo, Roberto Horowitz, Jongmin Lee, Jongeun ChoiThu, 12 Ma🤖 cs.LG

World Models That Know When They Don't Know - Controllable Video Generation with Calibrated Uncertainty

This paper proposes C3, a novel uncertainty quantification method that trains controllable video models to generate high-resolution, calibrated confidence heatmaps at the subpatch level by estimating uncertainty in latent space and using strictly proper scoring rules, thereby enabling reliable hallucination detection and out-of-distribution identification for robotics applications.

Zhiting Mei, Tenny Yin, Micah Baker, Ola Shorinwa, Anirudha MajumdarThu, 12 Ma🤖 cs.AI

PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations

This paper introduces PvP, a proprioceptive-privileged contrastive learning framework that enhances data efficiency and robustness in humanoid robot whole-body control by learning compact task-relevant representations without hand-crafted augmentations, supported by the new SRL4Humanoid evaluation framework.

Mingqi Yuan, Tao Yu, Haolin Song, Bo Li, Xin Jin, Hua Chen, Wenjun ZengThu, 12 Ma🤖 cs.LG

Cosmos-H-Surgical: Learning Surgical Robot Policies from Videos via World Modeling

This paper addresses the scarcity of labeled surgical robot data by introducing Cosmos-H-Surgical, a world model that generates realistic surgical videos and infers synthetic kinematics via an inverse dynamics model, enabling the training of superior surgical policies that outperform those trained solely on limited real-world demonstrations.

Yufan He, Pengfei Guo, Mengya Xu, Zhaoshuo Li, Andriy Myronenko, Dillan Imans, Bingjie Liu, Dongren Yang, Mingxue Gu, Yongnan Ji, Yueming Jin, Ren Zhao, Baiyong Shen, Daguang XuThu, 12 Ma💻 cs