cs.RO papers | Gist.Science

Seed2Scale: A Self-Evolving Data Engine for Embodied AI via Small to Large Model Synergy and Multimodal Evaluation

Seed2Scale is a self-evolving data engine that overcomes data bottlenecks in embodied AI by synergizing a lightweight "SuperTiny" model for robust data collection with a large Vision-Language Model for autonomous quality verification, enabling a target model to achieve a 131.2% performance improvement starting from just four seed demonstrations.

Cong Tai, Zhaoyu Zheng, Haixu Long, Hansheng Wu, Zhengbin Long, Haodong Xiang, Rong Shi, Zhuo Cui, Shizhuang Zhang, Gang Qiu, He Wang, Ruifeng Li, Biao Liu, Zhenzhe Sun, Tao ShenTue, 10 Ma💻 cs

SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM

SAIL is a test-time scaling framework that enhances one-shot robot imitation learning by reframing trajectory generation as an iterative refinement process guided by Monte Carlo Tree Search, an automated retrieval archive, and a vision-language model-based scoring mechanism, thereby significantly improving success rates across diverse manipulation tasks.

Makoto Sato, Yusuke Iwasawa, Yujin Tang, So KurokiTue, 10 Ma💻 cs

Less is More: Robust Zero-Communication 3D Pursuit-Evasion via Representational Parsimony

This paper demonstrates that explicitly reducing observation dimensionality and implementing locality-aware credit assignment in a communication-free multi-agent system enhances robustness and performance in asymmetric 3D pursuit-evasion tasks within cluttered environments.

Jialin Ying, Zhihao Li, Zicheng Dong, Guohua Wu, Yihuan LiaoTue, 10 Ma💻 cs

EndoSERV: A Vision-based Endoluminal Robot Navigation System

EndoSERV is a novel vision-based navigation system for endoluminal robots that overcomes challenges like tissue deformation and label scarcity by combining segment-to-structure odometry with real-to-virtual transfer learning to achieve accurate localization without requiring real-world pose labels.

Junyang Wu, Fangfang Xie, Minghui Zhang, Hanxiao Zhang, Jiayuan Sun, Yun Gu, Guang-Zhong YangTue, 10 Ma💻 cs

Hierarchical Multi-Modal Planning for Fixed-Altitude Sparse Target Search and Sampling

This paper introduces HIMoS, a hierarchical multi-modal planning framework that enables Autonomous Underwater Vehicles to efficiently search for and sample sparse benthic targets like coral colonies at a fixed altitude by integrating a global topological route optimizer with a local differentiable belief propagation planner, thereby outperforming traditional exhaustive and adaptive sampling strategies in high-fidelity simulations.

Lingpeng Chen, Yuchen Zheng, Apple Pui-Yi Chui, Junfeng Wu, Ziyang HongTue, 10 Ma💻 cs

PhaForce: Phase-Scheduled Visual-Force Policy Learning with Slow Planning and Fast Correction for Contact-Rich Manipulation

PhaForce is a phase-scheduled visuomotor policy that enhances contact-rich manipulation by coordinating a slow, vision-dominant diffusion planner with a fast, force-driven corrector to enable high-frequency, phase-aware residual corrections, achieving an 86% success rate and superior adaptability compared to existing baselines.

Mingxin Wang, Zhirun Yue, Renhao Lu, Yizhe Li, Zihan Wang, Guoping Pan, Kangkang Dong, Jun Cheng, Yi Cheng, Houde LiuTue, 10 Ma💻 cs

Perception-Aware Communication-Free Multi-UAV Coordination in the Wild

This paper presents a communication-free multi-UAV coordination framework that leverages onboard anisotropic 3D LiDAR for simultaneous SLAM, obstacle detection, and neighbor tracking, enabling safe and scalable navigation in complex, GNSS-denied environments like dense forests.

Manuel Boldrer, Michal Kamler, Afzal Ahmad, Martin SaskaTue, 10 Ma💻 cs

MoMaStage: Skill-State Graph Guided Planning and Closed-Loop Execution for Long-Horizon Indoor Mobile Manipulation

MoMaStage is a structured vision-language framework that enables robust long-horizon indoor mobile manipulation by guiding task planning through a topology-aware Skill-State Graph and ensuring execution reliability via a closed-loop mechanism that triggers semantic replanning upon detecting physical deviations, all without requiring explicit scene mapping.

Chenxu Li, Zixuan Chen, Yetao Li, Jiapeng Xu, Hongyu Ding, Jieqi Shi, Jing Huo, Yang GaoTue, 10 Ma💻 cs

StructBiHOI: Structured Articulation Modeling for Long--Horizon Bimanual Hand--Object Interaction Generation

The paper proposes StructBiHOI, a hierarchical framework that combines a jointVAE for long-term planning, a maniVAE for frame-level refinement, and a Mamba-based diffusion denoiser to achieve stable, physically plausible, and semantically aligned long-horizon bimanual hand-object interaction generation.

Zhi Wang, Liu Liu, Ruonan Liu, Dan Guo, Meng WangTue, 10 Ma💻 cs

A Recipe for Stable Offline Multi-agent Reinforcement Learning

This paper identifies value-scale amplification as the primary cause of instability in non-linear value decomposition for offline multi-agent reinforcement learning and proposes a scale-invariant value normalization technique to stabilize training, ultimately providing a practical recipe to unlock the full potential of offline MARL.

Dongsu Lee, Daehee Lee, Amy ZhangTue, 10 Ma🤖 cs.LG

Human-Aware Robot Behaviour in Self-Driving Labs

This paper proposes an AI-driven perception method with hierarchical human intention prediction to enable mobile robot chemists in self-driving laboratories to proactively distinguish between human preparatory actions and transient interactions, thereby overcoming the inefficiencies of passive obstruction detection and streamlining human-robot coordination in shared-access scenarios.

Satheeshkumar Veeramani, Anna Kisil, Abigail Bentley, Hatem Fakhruldeen, Gabriella Pizzuto, Andrew I. CooperTue, 10 Ma💻 cs

Tactile Recognition of Both Shapes and Materials with Automatic Feature Optimization-Enabled Meta Learning

This paper proposes the AFOP-ML framework, an automatic feature optimization-enabled prototypical network that achieves rapid few-shot tactile recognition of both shapes and materials with high accuracy and robustness against perturbations, effectively addressing the challenges of data scarcity and time-consuming training in robotic applications.

Hongliang Zhao, Wenhui Yang, Yang Chen, Zhuorui Wang, Baiheng Liu, Longhui QinTue, 10 Ma💻 cs

FoMo: A Multi-Season Dataset for Robot Navigation in Forêt Montmorency

The FoMo dataset presents a comprehensive, multi-season collection of over 64 km of diverse robot navigation data from a boreal forest, featuring significant environmental changes like heavy snow and vegetation growth to challenge and evaluate the robustness of state-of-the-art odometry and SLAM systems.

Matej Boxan, Gabriel Jeanson, Alexander Krawciw, Effie Daum, Xinyuan Qiao, Sven Lilge, Timothy D. Barfoot, François PomerleauTue, 10 Ma💻 cs

Adaptive Entropy-Driven Sensor Selection in a Camera-LiDAR Particle Filter for Single-Vessel Tracking

This paper presents an adaptive entropy-driven sensor selection policy within a camera-LiDAR particle filter that dynamically switches between modalities to optimize tracking accuracy and continuity for single-vessel surveillance, validated through real-world maritime deployment.

Andrei Starodubov, Yaqub Aris Prabowo, Andreas Hadjipieris, Ioannis Kyriakides, Roberto GaleazziTue, 10 Ma🤖 cs.LG

R2F: Repurposing Ray Frontiers for LLM-free Object Navigation

The paper proposes R2F, an LLM-free framework for zero-shot open-vocabulary object navigation that repurposes ray frontiers as direction-conditioned semantic hypotheses to achieve competitive performance with real-time execution, eliminating the latency and computational overhead of iterative large-model queries.

Francesco Argenziano, John Mark Alexis Marcelo, Michele Brienza, Abdel Hakim Drid, Emanuele Musumeci, Daniele Nardi, Domenico D. Bloisi, Vincenzo SurianiTue, 10 Ma💻 cs

LAR-MoE: Latent-Aligned Routing for Mixture of Experts in Robotic Imitation Learning

LAR-MoE is a two-stage framework that decouples unsupervised skill discovery from policy learning by regularizing expert routing to align with a learned latent representation, enabling robots to achieve high success rates in heterogeneous manipulation tasks without requiring manual skill annotations.

Ariel Rodriguez, Chenpan Li, Lorenzo Mazza, Rayan Younis, Ortrun Hellig, Sebastian Bodenstedt, Martin Wagner, Stefanie SpeidelTue, 10 Ma💻 cs

STRIDE: Structured Lagrangian and Stochastic Residual Dynamics via Flow Matching

The paper proposes STRIDE, a hybrid dynamics learning framework that combines a Lagrangian Neural Network for energy-consistent rigid-body mechanics with Conditional Flow Matching for stochastic residual interaction forces, achieving significant improvements in long-horizon prediction and contact force accuracy for robotic systems in unstructured environments.

Prakrut Kotecha, Ganga Nair B, Shishir KolathayaTue, 10 Ma🤖 cs.LG

3PoinTr: 3D Point Tracks for Robot Manipulation Pretraining from Casual Videos

3PoinTr is a novel method that pretrains robust robot manipulation policies from casual, unconstrained human videos by using a transformer architecture to predict 3D point tracks as an embodiment-agnostic intermediate representation, enabling sample-efficient learning with minimal robot demonstrations.

Adam Hung, Bardienus Pieter Duisterhof, Jeffrey IchnowskiTue, 10 Ma💻 cs

An Open-Source Robotics Research Platform for Autonomous Laparoscopic Surgery

This paper introduces an open-source, robot-agnostic surgical robotics platform featuring a deterministic, closed-form RCM controller and full-stack ROS integration, which achieves sub-millimeter precision and expert-level trajectory smoothness in autonomous laparoscopic tasks across phantom, ex vivo, and in vivo porcine models.

Ariel Rodriguez, Lorenzo Mazza, Martin Lelis, Rayan Younis, Sebastian Bodenstedt, Martin Wagner, Stefanie SpeidelTue, 10 Ma💻 cs

The Neural Compass: Probabilistic Relative Feature Fields for Robotic Search

This paper introduces ProReFF, a feature field model that learns relative object co-occurrence distributions from unlabeled observations to guide robotic search agents, achieving 20% higher efficiency than strong baselines and up to 80% of human performance in the Matterport3D simulator.

Gabriele Somaschini, Adrian Röfer, Abhinav ValadaTue, 10 Ma🤖 cs.LG

← Previous Next →