cs.RO papers | Gist.Science

Multimodal Adversarial Quality Policy for Safe Grasping

This paper proposes the Multimodal Adversarial Quality Policy (MAQP), a framework that enhances safe robot grasping in human-robot interaction by introducing a Heterogeneous Dual-Patch Optimization Scheme and a Gradient-Level Modality Balancing Strategy to effectively generate multimodal adversarial patches that address distribution discrepancies and optimization imbalances between RGB and depth modalities.

Kunlin Xie, Chenghao Li, Haolan Zhang, Nak Young ChongWed, 11 Ma💻 cs

A 26-Gram Butterfly-Inspired Robot Achieving Autonomous Tailless Flight

This paper introduces \textit{AirPulse}, a 26-gram butterfly-inspired robot that achieves the first autonomous, closed-loop tailless flight at this scale by replicating low-frequency, high-amplitude biomechanical traits through a hierarchical control architecture featuring Stroke Timing Asymmetry Rhythm (STAR).

Weibin Gu, Chenrui Feng, Lian Liu, Chen Yang, Xingchi Jiao, Yuhe Ding, Xiaofei Shi, Chao Gao, Alessandro Rizzo, Guyue ZhouWed, 11 Ma💻 cs

UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations

UniBYD is a unified framework that leverages a unified morphological representation and a dynamic reinforcement learning algorithm with a hybrid shadow engine to bridge the embodiment gap, enabling robotic hands to transcend human imitation and discover manipulation policies optimally adapted to their own physical morphologies.

Tingyu Yuan, Biaoliang Guan, Wen Ye, Ziyan Tian, Yi Yang, Weijie Zhou, Zhaowen Li, Yan Huang, Peng Wang, Chaoyang Zhao, Jinqiao WangWed, 11 Ma💻 cs

Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning

The paper introduces AFRO, a self-supervised framework that learns dynamics-aware 3D visual representations by modeling state-action-state transitions via a generative diffusion process, thereby significantly improving robotic manipulation performance across diverse simulated and real-world tasks without requiring explicit action or reconstruction supervision.

Qiwei Liang, Boyang Cai, Minghao Lai, Sitong Zhuang, Tao Lin, Yan Qin, Yixuan Ye, Jiaming Liang, Renjing XuWed, 11 Ma💻 cs

EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations

EgoMI is a framework that bridges the human-robot embodiment gap in imitation learning by capturing synchronized end-effector and active head trajectories from egocentric human demonstrations and utilizing a memory-augmented policy to enable robust whole-body manipulation on semi-humanoid robots.

Justin Yu, Yide Shentu, Di Wu, Pieter Abbeel, Ken Goldberg, Philipp WuWed, 11 Ma💻 cs

Revisiting Replanning from Scratch: Real-Time Incremental Planning with Fast Almost-Surely Asymptotically Optimal Planners

This paper challenges the conventional assumption that reactive replanning requires updating existing plans by demonstrating that using fast almost-surely asymptotically optimal (ASAO) algorithms to solve a series of independent planning problems offers a more efficient and effective approach for navigating changing environments.

Mitchell E. C. Sabbadini, Andrew H. Liu, Joseph Ruan, Tyler S. Wilson, Zachary Kingston, Jonathan D. GammellWed, 11 Ma💻 cs

NaviGait: Navigating Dynamically Feasible Gait Libraries using Deep Reinforcement Learning

NaviGait is a hierarchical framework that combines trajectory optimization and deep reinforcement learning to synthesize robust, intuitive bipedal locomotion by selecting and minimally morphing gaits from an offline library, thereby simplifying reward design and accelerating training while maintaining high fidelity to reference motions.

Neil Janwani, Varun Madabushi, Maegan TuckerWed, 11 Ma💻 cs

Asset-Centric Metric-Semantic Maps of Indoor Environments

This paper presents an asset-centric metric-semantic mapping approach that combines detailed object meshes with natural language priors to create accurate, LLM-compatible indoor environment representations, achieving a superior balance between object-level detail and global scene context compared to existing methods.

Christopher D. Hsu, Pratik ChaudhariWed, 11 Ma💻 cs

Connectivity Maintenance and Recovery for Multi-Robot Motion Planning

This paper proposes a real-time MPC-CLF-CBF motion planner based on Bézier curves that enables multi-robot fleets to maintain connectivity, navigate obstacle-rich environments without deadlocks, and recover from connection losses, as validated by simulations and experiments with eight Crazyflie quadrotors.

Yutong Wang, Lishuo Pan, Yichun Qu, Tengxiang Wang, Nora AyanianWed, 11 Ma💻 cs

Automated Coral Spawn Monitoring for Reef Restoration: The Coral Spawn and Larvae Imaging Camera System (CSLICS)

This paper introduces the Coral Spawn and Larvae Imaging Camera System (CSLICS), an automated, low-cost computer vision solution that significantly reduces labor-intensive manual counting while accurately monitoring coral spawn and larvae to enhance reef restoration efforts.

Dorian Tsai, Christopher A. Brunner, Riki Lamont, F. Mikaela Nordborg, Andrea Severati, Java Terry, Karen Jackel, Matthew Dunbabin, Tobias Fischer, Scarlett RaineWed, 11 Ma💻 cs

Multi-Quadruped Cooperative Object Transport: Learning Decentralized Pinch-Lift-Move

This paper presents a decentralized, communication-free learning framework for teams of quadruped robots to cooperatively transport ungraspable objects through physical contact alone, utilizing a hierarchical policy and a specialized reward formulation to achieve robust, scalable coordination without rigid mechanical coupling.

Bikram Pandit, Aayam Kumar Shrestha, Alan FernWed, 11 Ma💻 cs

You Only Pose Once: A Minimalist's Detection Transformer for Monocular RGB Category-level 9D Multi-Object Pose Estimation

YOPO is a minimalist, single-stage transformer framework that unifies 2D object detection and category-level 9-DoF pose estimation from monocular RGB images without requiring pseudo-depth or CAD models, achieving state-of-the-art performance on multiple benchmarks.

Hakjin Lee, Junghoon Seo, Jaehoon SimWed, 11 Ma💻 cs

Physics-Conditioned Grasping for Stable Tool Use

This paper introduces inverse Tool-use Planning (iTuP) and its associated Stable Dynamic Grasp Network (SDG-Net), which enhance robotic tool use success by selecting grasps that minimize predicted task-induced wrench and torque rather than relying solely on perception or static geometry.

Noah Trupin, Zixing Wang, Ahmed H. QureshiWed, 11 Ma💻 cs

Unveiling the Potential of iMarkers: Invisible Fiducial Markers for Advanced Robotics

This paper introduces iMarkers, a novel class of invisible fiducial markers detectable only by robots and AR devices, which overcome the visual aesthetic limitations of traditional markers while offering customizable production, robust detection algorithms, and proven effectiveness across diverse robotics scenarios.

Ali Tourani, Deniz Isinsu Avsar, Hriday Bavle, Jose Luis Sanchez-Lopez, Jan Lagerwall, Holger VoosWed, 11 Ma💻 cs

Open-World Task and Motion Planning via Vision-Language Model Genereated Constraints

The paper introduces OWL-TAMP, a novel framework that integrates Vision-Language Models into Task and Motion Planning systems to generate language-parameterized discrete and continuous constraints, enabling robots to solve complex, long-horizon manipulation tasks specified in natural language within open-world environments.

Nishanth Kumar, William Shen, Fabio Ramos, Dieter Fox, Tomás Lozano-Pérez, Leslie Pack Kaelbling, Caelan Reed GarrettWed, 11 Ma💻 cs

TiPToP: A Modular Open-Vocabulary Planning System for Robotic Manipulation

TiPToP is a modular, open-vocabulary robotic planning system that integrates pretrained vision foundation models with a Task and Motion Planner to solve multi-step manipulation tasks from RGB images and natural language instructions without requiring any robot-specific training data, achieving performance comparable to or better than fine-tuned vision-language-action models while enabling detailed failure mode analysis.

William Shen, Nishanth Kumar, Sahit Chintalapudi, Jie Wang, Christopher Watson, Edward Hu, Jing Cao, Dinesh Jayaraman, Leslie Pack Kaelbling, Tomás Lozano-PérezWed, 11 Ma💻 cs

Kinodynamic Motion Retargeting for Humanoid Locomotion via Multi-Contact Whole-Body Trajectory Optimization

This paper introduces KDMR, a novel framework that formulates humanoid motion retargeting as a multi-contact whole-body trajectory optimization problem incorporating rigid-body dynamics and ground reaction forces to generate physically consistent, dynamically feasible locomotion trajectories that significantly outperform purely kinematic methods in both motion quality and downstream control policy performance.

Xiaoyu Zhang, Steven Haener, Varun Madabushi, Maegan TuckerWed, 11 Ma💻 cs

Robust Cooperative Localization in Featureless Environments: A Comparative Study of DCL, StCL, CCL, CI, and Standard-CL

This paper presents a comparative study of five cooperative localization algorithms in featureless, GPS-denied environments, revealing that while Sequential and Standard methods offer high accuracy at the cost of filter inconsistency, Covariance Intersection provides the most balanced trade-off between accuracy and robustness for safety-critical applications.

Nivand Khosravi, Meysam Basiri, Rodrigo VenturaWed, 11 Ma💻 cs

TIMID: Time-Dependent Mistake Detection in Videos of Robot Executions

This paper introduces TIMID, a weakly supervised video anomaly detection framework that leverages task and mistake prompts to detect complex, time-dependent errors in robot executions, addressing the limitations of existing models and out-of-the-box VLMs through a novel multi-robot simulation dataset for zero-shot evaluation.

Nerea Gallego (University of Zaragoza), Fernando Salanova (University of Zaragoza), Claudio Mannarano (University of Zaragoza, University of Torino), Cristian Mahulea (University of Zaragoza), Eduardo Montijano (University of Zaragoza)Wed, 11 Ma💻 cs

MuxGel: Simultaneous Dual-Modal Visuo-Tactile Sensing via Spatially Multiplexing and Deep Reconstruction

MuxGel is a spatially multiplexed visuo-tactile sensor that overcomes the opacity trade-off in existing GelSight-style devices by using a checkerboard coating to simultaneously capture pre-contact vision and post-contact tactile signals through a single camera, with high-fidelity reconstruction achieved via a deep learning framework.

Zhixian Hu, Zhengtong Xu, Sheeraz Athar, Juan Wachs, Yu SheWed, 11 Ma💻 cs