cs.RO papers | Gist.Science

R2F: Repurposing Ray Frontiers for LLM-free Object Navigation

The paper proposes R2F, an LLM-free framework for zero-shot open-vocabulary object navigation that repurposes ray frontiers as direction-conditioned semantic hypotheses to achieve competitive performance with real-time execution, eliminating the latency and computational overhead of iterative large-model queries.

Francesco Argenziano, John Mark Alexis Marcelo, Michele Brienza, Abdel Hakim Drid, Emanuele Musumeci, Daniele Nardi, Domenico D. Bloisi, Vincenzo SurianiTue, 10 Ma💻 cs

FoMo: A Multi-Season Dataset for Robot Navigation in Forêt Montmorency

The FoMo dataset presents a comprehensive, multi-season collection of over 64 km of diverse robot navigation data from a boreal forest, featuring significant environmental changes like heavy snow and vegetation growth to challenge and evaluate the robustness of state-of-the-art odometry and SLAM systems.

Matej Boxan, Gabriel Jeanson, Alexander Krawciw, Effie Daum, Xinyuan Qiao, Sven Lilge, Timothy D. Barfoot, François PomerleauTue, 10 Ma💻 cs

Tactile Recognition of Both Shapes and Materials with Automatic Feature Optimization-Enabled Meta Learning

This paper proposes the AFOP-ML framework, an automatic feature optimization-enabled prototypical network that achieves rapid few-shot tactile recognition of both shapes and materials with high accuracy and robustness against perturbations, effectively addressing the challenges of data scarcity and time-consuming training in robotic applications.

Hongliang Zhao, Wenhui Yang, Yang Chen, Zhuorui Wang, Baiheng Liu, Longhui QinTue, 10 Ma💻 cs

Human-Aware Robot Behaviour in Self-Driving Labs

This paper proposes an AI-driven perception method with hierarchical human intention prediction to enable mobile robot chemists in self-driving laboratories to proactively distinguish between human preparatory actions and transient interactions, thereby overcoming the inefficiencies of passive obstruction detection and streamlining human-robot coordination in shared-access scenarios.

Satheeshkumar Veeramani, Anna Kisil, Abigail Bentley, Hatem Fakhruldeen, Gabriella Pizzuto, Andrew I. CooperTue, 10 Ma💻 cs

StructBiHOI: Structured Articulation Modeling for Long--Horizon Bimanual Hand--Object Interaction Generation

The paper proposes StructBiHOI, a hierarchical framework that combines a jointVAE for long-term planning, a maniVAE for frame-level refinement, and a Mamba-based diffusion denoiser to achieve stable, physically plausible, and semantically aligned long-horizon bimanual hand-object interaction generation.

Zhi Wang, Liu Liu, Ruonan Liu, Dan Guo, Meng WangTue, 10 Ma💻 cs

MoMaStage: Skill-State Graph Guided Planning and Closed-Loop Execution for Long-Horizon Indoor Mobile Manipulation

MoMaStage is a structured vision-language framework that enables robust long-horizon indoor mobile manipulation by guiding task planning through a topology-aware Skill-State Graph and ensuring execution reliability via a closed-loop mechanism that triggers semantic replanning upon detecting physical deviations, all without requiring explicit scene mapping.

Chenxu Li, Zixuan Chen, Yetao Li, Jiapeng Xu, Hongyu Ding, Jieqi Shi, Jing Huo, Yang GaoTue, 10 Ma💻 cs

Perception-Aware Communication-Free Multi-UAV Coordination in the Wild

This paper presents a communication-free multi-UAV coordination framework that leverages onboard anisotropic 3D LiDAR for simultaneous SLAM, obstacle detection, and neighbor tracking, enabling safe and scalable navigation in complex, GNSS-denied environments like dense forests.

Manuel Boldrer, Michal Kamler, Afzal Ahmad, Martin SaskaTue, 10 Ma💻 cs

PhaForce: Phase-Scheduled Visual-Force Policy Learning with Slow Planning and Fast Correction for Contact-Rich Manipulation

PhaForce is a phase-scheduled visuomotor policy that enhances contact-rich manipulation by coordinating a slow, vision-dominant diffusion planner with a fast, force-driven corrector to enable high-frequency, phase-aware residual corrections, achieving an 86% success rate and superior adaptability compared to existing baselines.

Mingxin Wang, Zhirun Yue, Renhao Lu, Yizhe Li, Zihan Wang, Guoping Pan, Kangkang Dong, Jun Cheng, Yi Cheng, Houde LiuTue, 10 Ma💻 cs

Hierarchical Multi-Modal Planning for Fixed-Altitude Sparse Target Search and Sampling

This paper introduces HIMoS, a hierarchical multi-modal planning framework that enables Autonomous Underwater Vehicles to efficiently search for and sample sparse benthic targets like coral colonies at a fixed altitude by integrating a global topological route optimizer with a local differentiable belief propagation planner, thereby outperforming traditional exhaustive and adaptive sampling strategies in high-fidelity simulations.

Lingpeng Chen, Yuchen Zheng, Apple Pui-Yi Chui, Junfeng Wu, Ziyang HongTue, 10 Ma💻 cs

EndoSERV: A Vision-based Endoluminal Robot Navigation System

EndoSERV is a novel vision-based navigation system for endoluminal robots that overcomes challenges like tissue deformation and label scarcity by combining segment-to-structure odometry with real-to-virtual transfer learning to achieve accurate localization without requiring real-world pose labels.

Junyang Wu, Fangfang Xie, Minghui Zhang, Hanxiao Zhang, Jiayuan Sun, Yun Gu, Guang-Zhong YangTue, 10 Ma💻 cs

Less is More: Robust Zero-Communication 3D Pursuit-Evasion via Representational Parsimony

This paper demonstrates that explicitly reducing observation dimensionality and implementing locality-aware credit assignment in a communication-free multi-agent system enhances robustness and performance in asymmetric 3D pursuit-evasion tasks within cluttered environments.

Jialin Ying, Zhihao Li, Zicheng Dong, Guohua Wu, Yihuan LiaoTue, 10 Ma💻 cs

SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM

SAIL is a test-time scaling framework that enhances one-shot robot imitation learning by reframing trajectory generation as an iterative refinement process guided by Monte Carlo Tree Search, an automated retrieval archive, and a vision-language model-based scoring mechanism, thereby significantly improving success rates across diverse manipulation tasks.

Makoto Sato, Yusuke Iwasawa, Yujin Tang, So KurokiTue, 10 Ma💻 cs

Seed2Scale: A Self-Evolving Data Engine for Embodied AI via Small to Large Model Synergy and Multimodal Evaluation

Seed2Scale is a self-evolving data engine that overcomes data bottlenecks in embodied AI by synergizing a lightweight "SuperTiny" model for robust data collection with a large Vision-Language Model for autonomous quality verification, enabling a target model to achieve a 131.2% performance improvement starting from just four seed demonstrations.

Cong Tai, Zhaoyu Zheng, Haixu Long, Hansheng Wu, Zhengbin Long, Haodong Xiang, Rong Shi, Zhuo Cui, Shizhuang Zhang, Gang Qiu, He Wang, Ruifeng Li, Biao Liu, Zhenzhe Sun, Tao ShenTue, 10 Ma💻 cs

A General Lie-Group Framework for Continuum Soft Robot Modeling

This paper presents a unified Lie-group framework based on Cosserat rod theory and SE(3) cumulative parameterization that overcomes existing modeling limitations to provide efficient, constraint-free analytical expressions for the kinematics, statics, and dynamics of diverse continuum soft robotic structures.

Lingxiao Xun, Benoît Rosa, Jérôme Szewczyk, Brahim TamadazteTue, 10 Ma💻 cs

Fusion-Poly: A Polyhedral Framework Based on Spatial-Temporal Fusion for 3D Multi-Object Tracking

Fusion-Poly is a novel spatial-temporal fusion framework for 3D multi-object tracking that effectively leverages asynchronous LiDAR and camera observations to enable higher-frequency state updates and achieve state-of-the-art performance on the nuScenes benchmark.

Xian Wu, Yitao Wu, Xiaoyu Li, Zijia Li, Lijun Zhao, Lining SunTue, 10 Ma💻 cs

Edged USLAM: Edge-Aware Event-Based SLAM with Learning-Based Depth Priors

Edged USLAM is a hybrid visual-inertial SLAM system that integrates an edge-aware front-end and a lightweight depth module to achieve robust, drift-minimized localization on UAVs, particularly excelling in slow or structured trajectories under challenging illumination where purely event-based or learning-based methods may struggle.

Sebnem Sarıözkan, Hürkan Sahin, Olaya Álvarez-Tuñón, Erdal KayacanTue, 10 Ma💻 cs

Multifingered force-aware control for humanoid robots

This paper presents a model-based control framework for humanoid robots that utilizes trained tactile force estimators to dynamically redistribute forces across the torso, arm, wrist, and fingers, thereby maintaining stable contact with objects of varying mass or unstable configurations by minimizing the distance between the Center of Pressure and the contact polygon centroid.

Pasquale Marra, Gabriele M. Caddeo, Ugo Pattacini, Lorenzo NataleTue, 10 Ma💻 cs

POIROT: Investigating Direct Tangible vs. Digitally Mediated Interaction and Attitude Moderation in Multi-party Murder Mystery Games

This study challenges the assumption that physical robot interaction universally enhances user experience by demonstrating that while tangible delivery does not inherently improve engagement, it significantly reduces narrative immersion for individuals with high negative attitudes toward robots, who instead benefit from digitally mediated interfaces as a social buffer.

Wen Chen, Rongxi Chen, Shankai Chen, Huiyang Gong, Minghui Guo, Yingri Xu, Xintong Wu, Xinyi FuTue, 10 Ma💻 cs

UniGround: Universal 3D Visual Grounding via Training-Free Scene Parsing

UniGround introduces a novel, training-free framework for universal 3D visual grounding that leverages global candidate filtering and local precision reasoning to achieve state-of-the-art zero-shot performance in localizing arbitrary objects within complex 3D environments without relying on pre-trained models or 3D supervision.

Jiaxi Zhang, Yunheng Wang, Wei Lu, Taowen Wang, Weisheng Xu, Shuning Zhang, Yixiao Feng, Yuetong Fang, Renjing XuTue, 10 Ma💻 cs

Towards Human-Like Manipulation through RL-Augmented Teleoperation and Mixture-of-Dexterous-Experts VLA

This paper proposes an integrated framework combining RL-augmented teleoperation via the IMCopilot assistant and a Mixture-of-Dexterous-Experts VLA (MoDE-VLA) architecture to overcome data and learning bottlenecks, enabling robust human-like, contact-rich bimanual in-hand manipulation with significantly improved success rates.

Tutian Tang, Xingyu Ji, Wanli Xing, Ce Hao, Wenqiang Xu, Lin Shao, Cewu Lu, Qiaojun Yu, Jiangmiao Pang, Kaifeng ZhangTue, 10 Ma💻 cs

← Previous Next →