APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots

The paper presents APEX, a deep reinforcement learning framework that enables a 29-DoF Unitree G1 humanoid robot to autonomously traverse platforms up to 114% of its leg length by composing perceptive climbing, walking, and reconfiguration skills through a novel ratchet progress reward and robust sim-to-real perception strategies.

Yikai Wang, Tingxuan Leng, Changyi Lin, Shiqi Liu, Shir Simon, Bingqing Chen, Jonathan Francis, Ding Zhao2026-03-09💻 cs

MiDAS: A Multimodal Data Acquisition System and Dataset for Robot-Assisted Minimally Invasive Surgery

This paper introduces MiDAS, an open-source, platform-agnostic system that enables non-invasive, time-synchronized multimodal data acquisition for robot-assisted minimally invasive surgery, validated by demonstrating that its external sensing approach achieves gesture recognition performance comparable to proprietary telemetry while releasing the first annotated dataset for hernia repair suturing.

Keshara Weerasinghe (MD), Seyed Hamid Reza Roodabeh (MD), Andrew Hawkins (MD), Zhaomeng Zhang, Zachary Schrader, Homa Alemzadeh2026-03-09🤖 cs.LG

Beyond Imitation: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

This paper proposes RL-Co, a reinforcement learning-based sim-real co-training framework that combines supervised fine-tuning on mixed real and simulated data with interactive simulation fine-tuning anchored by real-world data, achieving significant improvements in real-world success rates, generalization, and data efficiency for Vision-Language-Action models.

Liangzhi Shi, Shuaihang Chen, Feng Gao, Yinuo Chen, Kang Chen, Tonghe Zhang, Hongzhi Zang, Weinan Zhang, Chao Yu, Yu Wang2026-03-09💻 cs

ProFocus: Proactive Perception and Focused Reasoning in Vision-and-Language Navigation

ProFocus is a training-free framework that enhances Vision-and-Language Navigation by unifying proactive perception, which generates targeted visual queries to fill information gaps, and focused reasoning, which utilizes Branch-Diverse Monte Carlo Tree Search to prioritize high-value historical contexts, thereby achieving state-of-the-art zero-shot performance on R2R and REVERIE benchmarks.

Wei Xue, Mingcheng Li, Xuecheng Wu, Jingqun Tang, Dingkang Yang, Lihua Zhang2026-03-09💻 cs

Digital-Twin Losses for Lane-Compliant Trajectory Prediction at Urban Intersections

This paper presents a digital twin-driven V2X trajectory prediction framework for urban intersections that employs a novel twin loss function alongside standard MSE to enforce traffic rules, collision avoidance, and motion diversity, thereby significantly reducing safety violations while maintaining high prediction accuracy and real-time performance.

Kuo-Yi Chao, Erik Leo Haß, Melina Gegg, Jiajie Zhang, Ralph Raßhofer, Alois Christian Knoll2026-03-09💻 cs

TEGA: A Tactile-Enhanced Grasping Assistant for Assistive Robotics via Sensor Fusion and Closed-Loop Haptic Feedback

This paper presents TEGA, a closed-loop assistive teleoperation framework that fuses EMG-based intent inference with visuotactile sensing to deliver real-time vibrotactile feedback via a wearable vest, enabling users with upper limb disabilities to intuitively modulate grasp force and significantly improve manipulation stability.

Hengxu You, Tianyu Zhou, Fang Xu, Kaleb Smith, Eric Jing Du2026-03-09💻 cs

RACAS: Controlling Diverse Robots With a Single Agentic System

The paper introduces RACAS, a robot-agnostic agentic system that uses natural language communication between LLM/VLM-based modules to control diverse robotic platforms without requiring code modifications or retraining, successfully demonstrating its effectiveness across wheeled, multi-jointed, and underwater robots.

Dylan R. Ashley, Jan Przepióra, Yimeng Chen, Ali Abualsaud, Nurzhan Yesmagambet, Shinkyu Park, Eric Feron, Jürgen Schmidhuber2026-03-09🤖 cs.AI

RFM-HRI : A Multimodal Dataset of Medical Robot Failure, User Reaction and Recovery Preferences for Item Retrieval Tasks

This paper introduces the RFM-HRI dataset, a multimodal collection of human-robot interactions in medical crash-cart settings that systematically analyzes user verbal and non-verbal reactions to various communication failures and their preferences for recovery strategies to improve safety-critical HRI systems.

Yashika Batra, Giuliano Pioldi, Promise Ekpo, Arman Sayatqyzy, Purnjay Maruur, Shalom Otieno, Kevin Ching, Angelique Taylor2026-03-09💻 cs

Relational Semantic Reasoning on 3D Scene Graphs for Open World Interactive Object Search

The paper introduces SCOUT, a computationally efficient method for open-world interactive object search that leverages 3D scene graphs and relational heuristics distilled from large language models to outperform embedding-based approaches while matching LLM-level performance in both simulation and real-world environments.

Imen Mahdi, Matteo Cassinelli, Fabien Despinoy, Tim Welschehold, Abhinav Valada2026-03-09🤖 cs.AI