cs.RO papers | Gist.Science

Interactive World Simulator for Robot Policy Training and Evaluation

This paper presents the Interactive World Simulator, a fast and physically consistent framework leveraging consistency models to generate high-fidelity long-horizon video predictions that enable scalable robot policy training and reliable real-world evaluation using solely simulated data.

Yixuan Wang, Rhythm Syed, Fangyu Wu, Mengchao Zhang, Aykut Onol, Jose Barreiros, Hooshang Nayyeri, Tony Dear, Huan Zhang, Yunzhu LiTue, 10 Ma🤖 cs.LG

Automated Layout and Control Co-Design of Robust Multi-UAV Transportation Systems

This paper presents a novel co-design approach that simultaneously optimizes the physical arrangement and control strategies of multiple rigidly connected quadcopters transporting a payload, utilizing a new H2-inspired robustness metric to maximize disturbance rejection capabilities, which is experimentally validated with diverse multi-UAV fleets and payload shapes.

Carlo Bosio, Mark W. MuellerThu, 12 Ma⚡ eess

vS-Graphs: Tightly Coupling Visual SLAM and 3D Scene Graphs Exploiting Hierarchical Scene Understanding

This paper introduces vS-Graphs, a real-time Visual SLAM framework that tightly couples vision-based scene understanding with 3D scene graphs to generate semantically rich, hierarchical maps, achieving a 15.22% accuracy improvement over state-of-the-art methods while matching LiDAR-based semantic detection performance using only visual features.

Ali Tourani, Saad Ejaz, Hriday Bavle, Miguel Fernandez-Cortizas, David Morilla-Cabello, Jose Luis Sanchez-Lopez, Holger VoosThu, 12 Ma💻 cs

A Chain-Driven, Sandwich-Legged Quadruped Robot: Design and Experimental Analysis

This paper presents the design, fabrication, and experimental validation of a cost-effective, open-source, chain-driven mid-size quadruped robot featuring a sandwich-legged architecture and quasi-direct-drive actuators to achieve agile locomotion with improved reliability and safety.

Aman Singh, Bhavya Giri Goswami, Ketan Nehete, Shishir N. Y. KolathayaThu, 12 Ma💻 cs

Score Matching Diffusion Based Feedback Control and Planning of Nonlinear Systems

This paper proposes a deterministic diffusion-based framework for controlling the probability density of nonlinear control-affine systems by leveraging a forward noise-excitation process and a reverse denoising feedback law to steer state distributions toward desired targets, with theoretical guarantees for drift-free and linear time-invariant dynamics.

Karthik Elamvazhuthi, Darshan Gadginmath, Fabio PasqualettiThu, 12 Ma⚡ eess

Scalable Multi-Task Learning through Spiking Neural Networks with Adaptive Task-Switching Policy for Intelligent Autonomous Agents

The paper proposes SwitchMT, a novel methodology for scalable multi-task learning in resource-constrained autonomous agents that combines a Deep Spiking Q-Network with active dendrites and an adaptive task-switching policy to effectively mitigate task interference and outperform state-of-the-art methods in Atari games.

Rachmad Vidya Wicaksana Putra, Avaneesh Devkota, Muhammad ShafiqueThu, 12 Ma🤖 cs.AI

REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?

This paper introduces REI-Bench, the first benchmark for evaluating robot task planning under vague referring expressions, revealing that such vagueness significantly degrades performance and demonstrating that a task-oriented context cognition approach effectively mitigates this issue to improve accessibility for non-expert users.

Chenxi Jiang, Chuhao Zhou, Jianfei YangThu, 12 Ma💬 cs.CL

Self-Improving Loops for Visual Robotic Planning

This paper proposes SILVR, a self-improving framework that enables visual robotic planners to iteratively enhance their performance on novel tasks by continuously updating an in-domain video model using self-collected trajectories, achieving robust results without requiring ground-truth reward functions or expert demonstrations.

Calvin Luo, Zilai Zeng, Mingxi Jia, Yilun Du, Chen SunThu, 12 Ma🤖 cs.AI

Pixel Motion Diffusion is What We Need for Robot Control

DAWN is a unified, end-to-end diffusion-based framework that bridges high-level intent and low-level robot actions through structured pixel motion representations, achieving state-of-the-art performance on benchmarks like CALVIN and MetaWorld while demonstrating robust real-world transfer with minimal finetuning.

E-Ro Nguyen, Yichi Zhang, Kanchana Ranasinghe, Xiang Li, Michael S. RyooThu, 12 Ma💻 cs

Symskill: Symbol and Skill Co-Invention for Data-Efficient and Reactive Long-Horizon Manipulation

Symskill is a unified framework that jointly learns symbolic abstractions and goal-oriented skills from unlabeled demonstrations to enable data-efficient, compositional, and real-time reactive long-horizon manipulation in dynamic environments.

Yifei Simon Shao, Yuchen Zheng, Sunan Sun, Pratik Chaudhari, Vijay Kumar, Nadia FigueroaThu, 12 Ma💻 cs

CompassNav: Steering From Path Imitation To Decision Understanding In Navigation

CompassNav introduces a new navigation paradigm that shifts from path imitation to decision understanding by leveraging a novel dataset with geodesic distance annotations and a gap-aware hybrid reward function, enabling a 7B model to achieve state-of-the-art performance on both simulated benchmarks and physical robots.

LinFeng Li, Jian Zhao, Yuan Xie, Xin Tan, Xuelong LiThu, 12 Ma💻 cs

Safe and Optimal Learning from Preferences via Weighted Temporal Logic with Applications in Robotics and Formula 1

This paper proposes a safety-guaranteed and optimal learning framework for autonomous systems that utilizes Weighted Signal Temporal Logic (WSTL) with structural pruning and log-transform techniques to efficiently solve preference-based learning problems as Mixed-Integer Linear Programs, validated through experiments in robotic navigation and Formula 1 racing.

Ruya Karagulle, Cristian-Ioan Vasile, Necmiye OzayThu, 12 Ma⚡ eess

MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent

MergeVLA is a merging-oriented Vision-Language-Action architecture that overcomes the non-mergeability of existing VLA experts by introducing task-masked sparse LoRA adapters and cross-attention-only action experts, enabling a single generalist agent to robustly handle diverse tasks and embodiments without performance degradation.

Yuxia Fu, Zhizhen Zhang, Yuqi Zhang, Zijian Wang, Zi Huang, Yadan LuoThu, 12 Ma💻 cs

CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents

This paper introduces CostNav, the first physics-grounded navigation benchmark that evaluates autonomous agents using real-world economic data to reveal that current methods, despite varying in hardware and architecture, all fail to achieve economic viability due to negative contribution margins.

Haebin Seong, Sungmin Kim, Yongjun Cho, Myunchul Joe, Geunwoo Kim, Yubeen Park, Sunhoo Kim, Yoonshik Kim, Suhwan Choi, Jaeyoon Jung, Jiyong Youn, Jinmyung Kwak, Sunghee Ahn, Jaemin Lee, Younggil Do, Seungyeop Yi, Woojin Cheong, Minhyeok Oh, Minchan Kim, Seongjae Kang, Samwoo Seong, Youngjae Yu, Yunsung LeeThu, 12 Ma🤖 cs.AI

Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments

This paper introduces Partially Equivariant Reinforcement Learning, a framework that mitigates error propagation in symmetry-breaking environments by selectively applying group-invariant or standard Bellman backups based on local symmetry, thereby achieving superior sample efficiency and generalization compared to existing methods.

Junwoo Chang, Minwoo Park, Joohwan Seo, Roberto Horowitz, Jongmin Lee, Jongeun ChoiThu, 12 Ma🤖 cs.LG

Cross-embodied Co-design for Dexterous Hands

This paper presents a co-design framework that simultaneously optimizes robotic hand morphology and control policies to enable the end-to-end design, fabrication, and deployment of task-specific dexterous hands in under 24 hours.

Kehlani Fay, Darin Anthony Djapri, Anya Zorin, James Clinton, Ali El Lahib, Hao Su, Michael T. Tolley, Sha Yi, Xiaolong WangThu, 12 Ma🤖 cs.LG

World Models That Know When They Don't Know - Controllable Video Generation with Calibrated Uncertainty

This paper proposes C3, a novel uncertainty quantification method that trains controllable video models to generate high-resolution, calibrated confidence heatmaps at the subpatch level by estimating uncertainty in latent space and using strictly proper scoring rules, thereby enabling reliable hallucination detection and out-of-distribution identification for robotics applications.

Zhiting Mei, Tenny Yin, Micah Baker, Ola Shorinwa, Anirudha MajumdarThu, 12 Ma🤖 cs.AI

PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations

This paper introduces PvP, a proprioceptive-privileged contrastive learning framework that enhances data efficiency and robustness in humanoid robot whole-body control by learning compact task-relevant representations without hand-crafted augmentations, supported by the new SRL4Humanoid evaluation framework.

Mingqi Yuan, Tao Yu, Haolin Song, Bo Li, Xin Jin, Hua Chen, Wenjun ZengThu, 12 Ma🤖 cs.LG

Global End-Effector Pose Control of an Underactuated Aerial Manipulator via Reinforcement Learning

This paper presents a reinforcement learning-based control framework that enables a lightweight, underactuated aerial manipulator with a differential 2-DoF arm to achieve precise six-DoF end-effector pose control and robust contact-rich manipulation through flight experiments.

Shlok Deshmukh, Javier Alonso-Mora, Sihao SunThu, 12 Ma💻 cs

Cosmos-H-Surgical: Learning Surgical Robot Policies from Videos via World Modeling

This paper addresses the scarcity of labeled surgical robot data by introducing Cosmos-H-Surgical, a world model that generates realistic surgical videos and infers synthetic kinematics via an inverse dynamics model, enabling the training of superior surgical policies that outperform those trained solely on limited real-world demonstrations.

Yufan He, Pengfei Guo, Mengya Xu, Zhaoshuo Li, Andriy Myronenko, Dillan Imans, Bingjie Liu, Dongren Yang, Mingxue Gu, Yongnan Ji, Yueming Jin, Ren Zhao, Baiyong Shen, Daguang XuThu, 12 Ma💻 cs

← Previous Next →