cs.RO papers | Gist.Science

From Flow to One Step: Real-Time Multi-Modal Trajectory Policies via Implicit Maximum Likelihood Estimation-based Distribution Distillation

This paper proposes a real-time multi-modal trajectory policy framework that distills a Conditional Flow Matching expert into a single-step student using Implicit Maximum Likelihood Estimation and a bi-directional Chamfer distance, thereby eliminating the latency of iterative ODE integration while preserving multi-modal action diversity for high-frequency robotic control.

Ju Dong, Liding Zhang, Lei Zhang, Yu Fu, Kaixin Bai, Zoltan-Csaba Marton, Zhenshan Bing, Zhaopeng Chen, Alois Christian Knoll, Jianwei ZhangWed, 11 Ma🤖 cs.AI

SPAARS: Safer RL Policy Alignment through Abstract Exploration and Refined Exploitation of Action Space

SPAARS is a curriculum learning framework for offline-to-online reinforcement learning that safely improves policies by initially exploring a low-dimensional latent space to ensure sample efficiency and stability, then seamlessly transitioning to raw action space to bypass decoder-induced performance ceilings, thereby achieving superior results over state-of-the-art baselines on both robotic manipulation and locomotion tasks.

Swaminathan S K, Aritra HazraWed, 11 Ma🤖 cs.AI

Embodied Human Simulation for Quantitative Design and Analysis of Interactive Robotics

This paper presents a scalable, reinforcement learning-driven simulation framework featuring a full-body musculoskeletal model that enables the quantitative co-optimization of robotic structural design and control policies by providing direct access to internal human biomechanical metrics for interactive robotics.

Chenhui Zuo, Jinhao Xu, Michael Qian Vergnolle, Yanan SuiWed, 11 Ma🤖 cs.AI

ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video

ZeroWBC is a novel framework that enables natural, versatile whole-body control for humanoid robots by learning visuomotor policies directly from human egocentric videos, thereby eliminating the need for expensive and time-consuming teleoperation data collection.

Haoran Yang, Jiacheng Bao, Yucheng Xin, Haoming Song, Yuyang Tian, Bin Zhao, Dong Wang, Xuelong LiWed, 11 Ma🤖 cs.AI

DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation

DexHiL is the first integrated human-in-the-loop framework for dexterous Vision-Language-Action models that combines coordinated arm-hand teleoperation with intervention-aware data sampling to significantly improve post-training performance and reliability in complex manipulation tasks.

Yifan Han, Zhongxi Chen, Yuxuan Zhao, Congsheng Xu, Yanming Shao, Yichuan Peng, Yao Mu, Wenzhao LianWed, 11 Ma🤖 cs.AI

PM-Nav: Priori-Map Guided Embodied Navigation in Functional Buildings

The paper introduces PM-Nav, a novel framework that leverages priori-semantic maps and hierarchical chain-of-thought prompting to overcome the challenges of language-driven navigation in functional buildings with highly similar features, achieving substantial performance improvements over existing methods in both simulation and real-world environments.

Jiang Gao, Xiangyu Dong, Haozhou Li, Haoran Zhao, Yaoming Zhou, Xiaoguang MaWed, 11 Ma🤖 cs.AI

Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

This paper proposes a unified taxonomy and evaluation framework for latent world models in automated driving, organizing design choices by latent representations and structural priors while identifying key internal mechanics and research directions to enhance robustness, generalization, and deployability.

Rongxiang Zeng, Yongqi DongWed, 11 Ma🤖 cs.AI

GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models

GST-VLA introduces a novel framework that enhances Vision-Language-Action models by converting visual observations into anisotropic 3D Gaussian spatial tokens and employing 3D Depth-Aware Chain-of-Thought reasoning to achieve state-of-the-art performance on precision-demanding robotic manipulation tasks.

Md Selim Sarowar, Omer Tariq, Sungho KimWed, 11 Ma🤖 cs.AI

PlayWorld: Learning Robot World Models from Autonomous Play

PlayWorld introduces a fully autonomous pipeline that trains high-fidelity, physically consistent video world models from unsupervised robot self-play, outperforming human-collected data in predicting complex interactions and significantly boosting real-world reinforcement learning success rates.

Tenny Yin, Zhiting Mei, Zhonghe Zheng, Miyu Yamane, David Wang, Jade Sceats, Samuel M. Bateman, Lihan Zha, Apurva Badithela, Ola Shorinwa, Anirudha MajumdarWed, 11 Ma🤖 cs.AI

Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG

This paper introduces CMA-ES-IG, an algorithm that enhances robot preference learning by generating perceptually distinct and informative queries, thereby improving scalability, robustness, and user experience compared to existing state-of-the-art methods.

Nathaniel Dennler, Zhonghao Shi, Yiran Tao, Andreea Bobu, Stefanos Nikolaidis, Maja MataricWed, 11 Ma🤖 cs.AI

Scale-Plan: Scalable Language-Enabled Task Planning for Heterogeneous Multi-Robot Teams

Scale-Plan is a scalable framework that leverages large language models to filter irrelevant perceptual information and construct compact, task-relevant representations from natural language instructions, thereby enabling efficient and reliable long-horizon planning for heterogeneous multi-robot teams while outperforming existing baselines on the new MAT2-THOR benchmark.

Piyush Gupta, Sangjae Bae, Jiachen Li, David IseleWed, 11 Ma🤖 cs.AI

Magnetically Driven Elastic Microswimmers: Exploiting Hysteretic Collapse for Autonomous Propulsion and Independent Control

This paper proposes and optimizes a magnetically driven elastic microswimmer that achieves autonomous, nonreciprocal propulsion through hysteretic collapse of its segments, enabling the simultaneous independent control of multiple microrobots via a single oscillating magnetic field for potential medical applications.

Theo Lequy, Andreas M. MenzelWed, 11 Ma🔬 physics.app-ph

Receptogenesis in a Vascularized Robotic Embodiment

This paper presents a vascularized robotic system capable of *in situ* photopolymerization to dynamically grow functional sensors from internal fluid reserves, thereby enabling real-time physical adaptation and closed-loop control in complex environments.

Kadri-Ann Pankratov, Leonid Zinatullin, Hans Priks, Adele Metsniit, Urmas Johanson, Tarmo Tamm, Alvo Aabloo, Edoardo Sinibaldi, Indrek MustWed, 11 Ma🔬 cond-mat.mtrl-sci

Vectorized Online POMDP Planning

This paper introduces VOPP, a novel vectorized online POMDP planner that eliminates synchronization bottlenecks by representing all planning data as tensors and performing fully parallelized expectation estimations, achieving a 20-fold efficiency gain over existing parallel solvers and outperforming state-of-the-art sequential methods with a 1000-fold reduction in planning budget.

Marcus Hoerger, Muhammad Sudrajat, Hanna KurniawatiTue, 10 Ma💻 cs

ViLAM: Distilling Vision-Language Reasoning into Attention Maps for Social Robot Navigation

ViLAM is a novel method that distills vision-language reasoning from large Vision-Language Models into spatial attention maps to guide socially compliant robot navigation, achieving significant improvements in success rates through real-world validation.

Mohamed Elnoor, Kasun Weerakoon, Gershom Seneviratne, Jing Liang, Vignesh Rajagopal, Dinesh ManochaTue, 10 Ma💻 cs

Influence-Based Reward Modulation for Implicit Communication in Human-Robot Interaction

This paper proposes a method to foster implicit communication in human-robot interaction by modulating inter-agent influence through Transfer Entropy within a reward framework, demonstrating that enhancing influence improves collaboration while resisting it promotes independence, as validated through simulations and real-world experiments.

Haoyang Jiang, Elizabeth A. Croft, Michael G. BurkeTue, 10 Ma💻 cs

Utility Theory based Cognitive Modeling in the Application of Robotics: A Survey

This survey reviews the application of utility theory to cognitive modeling in robotics, tracing its evolution from behavior-based approaches to value systems that guide decision-making, learning, and cooperation in single and multi-agent environments, while identifying current limitations and proposing future research directions.

Qin YangTue, 10 Ma💻 cs

An Open-Source Robotics Research Platform for Autonomous Laparoscopic Surgery

This paper introduces an open-source, robot-agnostic surgical robotics platform featuring a deterministic, closed-form RCM controller and full-stack ROS integration, which achieves sub-millimeter precision and expert-level trajectory smoothness in autonomous laparoscopic tasks across phantom, ex vivo, and in vivo porcine models.

Ariel Rodriguez, Lorenzo Mazza, Martin Lelis, Rayan Younis, Sebastian Bodenstedt, Martin Wagner, Stefanie SpeidelTue, 10 Ma💻 cs

3PoinTr: 3D Point Tracks for Robot Manipulation Pretraining from Casual Videos

3PoinTr is a novel method that pretrains robust robot manipulation policies from casual, unconstrained human videos by using a transformer architecture to predict 3D point tracks as an embodiment-agnostic intermediate representation, enabling sample-efficient learning with minimal robot demonstrations.

Adam Hung, Bardienus Pieter Duisterhof, Jeffrey IchnowskiTue, 10 Ma💻 cs

LAR-MoE: Latent-Aligned Routing for Mixture of Experts in Robotic Imitation Learning

LAR-MoE is a two-stage framework that decouples unsupervised skill discovery from policy learning by regularizing expert routing to align with a learned latent representation, enabling robots to achieve high success rates in heterogeneous manipulation tasks without requiring manual skill annotations.

Ariel Rodriguez, Chenpan Li, Lorenzo Mazza, Rayan Younis, Ortrun Hellig, Sebastian Bodenstedt, Martin Wagner, Stefanie SpeidelTue, 10 Ma💻 cs

← Previous Next →