cs.RO papers | Gist.Science

InsSo3D: Inertial Navigation System and 3D Sonar SLAM for turbid environment inspection

This paper presents InsSo3D, a robust SLAM framework that fuses 3D sonar point clouds with Inertial Navigation System data to enable accurate, large-scale 3D mapping and drift correction for underwater inspections in turbid environments.

Simon Archieri, Ahmet Cinar, Shu Pan, Jonatan Scharff Willners, Michele Grimaldi, Ignacio Carlucho, Yvan Petillot2026-03-09💻 cs

(MGS) $^2$ -Net: Unifying Micro-Geometric Scale and Macro-Geometric Structure for Cross-View Geo-Localization

The paper proposes (MGS) $^2$ -Net, a geometry-grounded framework that unifies Micro-Geometric Scale Adaptation and Macro-Geometric Structure Filtering to overcome geometric misalignment and achieve state-of-the-art cross-view geo-localization performance.

Minglei Li, Mengfan He, Chunyu Li, Chao Chen, Xingyu Shao, Ziyang Meng2026-03-09💻 cs

APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots

The paper presents APEX, a deep reinforcement learning framework that enables a 29-DoF Unitree G1 humanoid robot to autonomously traverse platforms up to 114% of its leg length by composing perceptive climbing, walking, and reconfiguration skills through a novel ratchet progress reward and robust sim-to-real perception strategies.

Yikai Wang, Tingxuan Leng, Changyi Lin, Shiqi Liu, Shir Simon, Bingqing Chen, Jonathan Francis, Ding Zhao2026-03-09💻 cs

MiDAS: A Multimodal Data Acquisition System and Dataset for Robot-Assisted Minimally Invasive Surgery

This paper introduces MiDAS, an open-source, platform-agnostic system that enables non-invasive, time-synchronized multimodal data acquisition for robot-assisted minimally invasive surgery, validated by demonstrating that its external sensing approach achieves gesture recognition performance comparable to proprietary telemetry while releasing the first annotated dataset for hernia repair suturing.

Keshara Weerasinghe (MD), Seyed Hamid Reza Roodabeh (MD), Andrew Hawkins (MD), Zhaomeng Zhang, Zachary Schrader, Homa Alemzadeh2026-03-09🤖 cs.LG

Beyond Imitation: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

This paper proposes RL-Co, a reinforcement learning-based sim-real co-training framework that combines supervised fine-tuning on mixed real and simulated data with interactive simulation fine-tuning anchored by real-world data, achieving significant improvements in real-world success rates, generalization, and data efficiency for Vision-Language-Action models.

Liangzhi Shi, Shuaihang Chen, Feng Gao, Yinuo Chen, Kang Chen, Tonghe Zhang, Hongzhi Zang, Weinan Zhang, Chao Yu, Yu Wang2026-03-09💻 cs

Learning Robust Control Policies for Inverted Pose on Miniature Blimp Robots

This paper presents a novel framework that combines a calibrated 3D simulation environment, a robust TD3-based control policy with domain randomization, and a mapping layer to successfully enable miniature blimp robots to achieve and maintain inverted poses in real-world settings.

Yuanlin Yang, Lin Hong, Fumin Zhang2026-03-09💻 cs

ROSER: Few-Shot Robotic Sequence Retrieval for Scalable Robot Learning

The paper introduces ROSER, a lightweight few-shot retrieval framework that extracts reusable, task-centric segments from unlabeled robotic logs using only 3-5 reference examples, thereby overcoming data scarcity by enabling scalable, high-accuracy utilization of large-scale continuous interaction datasets without task-specific training.

Zillur Rahman, Eddison Pham, Alejandro Daniel Noel, Cristian Meo2026-03-09💻 cs

An Embodied Companion for Visual Storytelling

This paper introduces "Companion," an embodied AI system that integrates a drawing robot with Large Language Models to facilitate bidirectional, speech-and-sketch-based co-creation, transforming the robot from a passive tool into a playful artistic partner capable of generating professionally recognized visual stories.

Patrick Tresset, Markus Wulfmeier2026-03-09🤖 cs.AI

RoboLayout: Differentiable 3D Scene Generation for Embodied Agents

RoboLayout is a differentiable 3D scene generation framework that extends LayoutVLM by integrating explicit reachability constraints and a local refinement stage to create semantically coherent, physically feasible indoor environments tailored to the specific capabilities of diverse embodied agents.

Ali Shamsaddinlou2026-03-09🤖 cs.AI

ProFocus: Proactive Perception and Focused Reasoning in Vision-and-Language Navigation

ProFocus is a training-free framework that enhances Vision-and-Language Navigation by unifying proactive perception, which generates targeted visual queries to fill information gaps, and focused reasoning, which utilizes Branch-Diverse Monte Carlo Tree Search to prioritize high-value historical contexts, thereby achieving state-of-the-art zero-shot performance on R2R and REVERIE benchmarks.

Wei Xue, Mingcheng Li, Xuecheng Wu, Jingqun Tang, Dingkang Yang, Lihua Zhang2026-03-09💻 cs

Digital-Twin Losses for Lane-Compliant Trajectory Prediction at Urban Intersections

This paper presents a digital twin-driven V2X trajectory prediction framework for urban intersections that employs a novel twin loss function alongside standard MSE to enforce traffic rules, collision avoidance, and motion diversity, thereby significantly reducing safety violations while maintaining high prediction accuracy and real-time performance.

Kuo-Yi Chao, Erik Leo Haß, Melina Gegg, Jiajie Zhang, Ralph Raßhofer, Alois Christian Knoll2026-03-09💻 cs

TEGA: A Tactile-Enhanced Grasping Assistant for Assistive Robotics via Sensor Fusion and Closed-Loop Haptic Feedback

This paper presents TEGA, a closed-loop assistive teleoperation framework that fuses EMG-based intent inference with visuotactile sensing to deliver real-time vibrotactile feedback via a wearable vest, enabling users with upper limb disabilities to intuitively modulate grasp force and significantly improve manipulation stability.

Hengxu You, Tianyu Zhou, Fang Xu, Kaleb Smith, Eric Jing Du2026-03-09💻 cs

PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions

PRISM is an instruction-conditioned framework that integrates imitation learning with reinforcement learning and human feedback to efficiently refine generic robotic manipulation policies into robust, fine-grained behaviors for new goals and constraints.

Arnau Boix-Granell, Alberto San-Miguel-Tello, Magí Dalmau-Moreno, Néstor García2026-03-09🤖 cs.AI

Task Parameter Extrapolation via Learning Inverse Tasks from Forward Demonstrations

This paper proposes a novel joint learning framework that enables robot policies to extrapolate to novel conditions by learning inverse tasks from forward demonstrations, achieving accurate zero-shot generalization and outperforming diffusion-based alternatives in complex manipulation scenarios.

Serdar Bahar, Fatih Dogangun, Matteo Saveriano, Yukie Nagai, Emre Ugur2026-03-09💻 cs

From Decoupled to Coupled: Robustness Verification for Learning-based Keypoint Detection with Joint Specifications

This paper introduces the first coupled robustness verification framework for heatmap-based keypoint detectors that uses a mixed-integer linear program to jointly bound deviations across all keypoints, thereby providing sound and less conservative guarantees than prior decoupled methods.

Xusheng Luo, Changliu Liu2026-03-09🤖 cs.LG

RACAS: Controlling Diverse Robots With a Single Agentic System

The paper introduces RACAS, a robot-agnostic agentic system that uses natural language communication between LLM/VLM-based modules to control diverse robotic platforms without requiring code modifications or retraining, successfully demonstrating its effectiveness across wheeled, multi-jointed, and underwater robots.

Dylan R. Ashley, Jan Przepióra, Yimeng Chen, Ali Abualsaud, Nurzhan Yesmagambet, Shinkyu Park, Eric Feron, Jürgen Schmidhuber2026-03-09🤖 cs.AI

Control Lyapunov Functions for Underactuated Soft Robots

This paper proposes a general control framework that ensures task-space regulation and tracking for underactuated soft robots with bounded inputs by enforcing a rapidly exponentially stabilizing Control Lyapunov function as a convex inequality constraint, demonstrating superior accuracy and stability compared to baseline methods across various simulation platforms.

Huy Pham, Zach J. Patterson2026-03-09💻 cs

RFM-HRI : A Multimodal Dataset of Medical Robot Failure, User Reaction and Recovery Preferences for Item Retrieval Tasks

This paper introduces the RFM-HRI dataset, a multimodal collection of human-robot interactions in medical crash-cart settings that systematically analyzes user verbal and non-verbal reactions to various communication failures and their preferences for recovery strategies to improve safety-critical HRI systems.

Yashika Batra, Giuliano Pioldi, Promise Ekpo, Arman Sayatqyzy, Purnjay Maruur, Shalom Otieno, Kevin Ching, Angelique Taylor2026-03-09💻 cs

Relational Semantic Reasoning on 3D Scene Graphs for Open World Interactive Object Search

The paper introduces SCOUT, a computationally efficient method for open-world interactive object search that leverages 3D scene graphs and relational heuristics distilled from large language models to outperform embedding-based approaches while matching LLM-level performance in both simulation and real-world environments.

Imen Mahdi, Matteo Cassinelli, Fabien Despinoy, Tim Welschehold, Abhinav Valada2026-03-09🤖 cs.AI

TransMASK: Masked State Representation through Learned Transformation

TransMASK is a self-supervised method that learns to mask irrelevant state components by aligning a transformation matrix with the expert policy's Jacobian, thereby improving robot generalization to new environments without requiring additional labels or modifications to existing imitation learning frameworks.

Sagar Parekh, Preston Culbertson, Dylan P. Losey2026-03-09💻 cs

← Previous Next →

cs.RO