Implicit Geometry Representations for Vision-and-Language Navigation from Web Videos

This paper introduces a large-scale framework for Vision-and-Language Navigation that leverages web-based room tour videos and implicit geometry representations to overcome simulator limitations, enabling robust zero-shot navigation agents with state-of-the-art performance across multiple benchmarks.

Mingfei Han, Haihong Hao, Liang Ma, Kamila Zhumakhanova, Ekaterina Radionova, Jingyi Zhang, Xiaojun Chang, Xiaodan Liang, Ivan LaptevWed, 11 Ma💻 cs

MO-Playground: Massively Parallelized Multi-Objective Reinforcement Learning for Robotics

This paper introduces MORLAX, a GPU-native multi-objective reinforcement learning algorithm, and MO-Playground, a suite of GPU-accelerated environments, which together enable massively parallelized training that achieves 25–270x speedups and superior Pareto fronts for complex robotics tasks compared to legacy CPU-based approaches.

Neil Janwani, Ellen Novoseller, Vernon J. Lawhern, Maegan TuckerWed, 11 Ma💻 cs

WESPR: Wind-adaptive Energy-Efficient Safe Perception & Planning for Robust Flight with Quadrotors

WESPR is a lightweight, real-time framework that integrates geometric perception and local weather data to predict wind fields generated by environmental obstacles, enabling quadrotors to proactively plan safe, energy-efficient paths and adapt control strategies for robust flight in turbulent conditions.

Khuzema Habib, Pranav Deshakulkarni Manjunath, Kasra Torshizi, Troi Williams, Pratap TokekarWed, 11 Ma💻 cs

Robust Spatiotemporal Motion Planning for Multi-Agent Autonomous Racing via Topological Gap Identification and Accelerated MPC

This paper presents a robust spatiotemporal motion planning framework for multi-agent autonomous racing that combines topological gap identification via stochastic Gaussian processes with a PTC-accelerated Linear Time-Varying MPC to achieve high-speed overtaking with strict kinematic feasibility and significantly reduced computational latency.

Mingyi Zhang, Cheng Hu, Yiqin Wang, Haotong Qin, Hongye Su, Lei XieWed, 11 Ma💻 cs

STONE Dataset: A Scalable Multi-Modal Surround-View 3D Traversability Dataset for Off-Road Robot Navigation

This paper introduces STONE, a large-scale, multi-modal off-road dataset featuring synchronized LiDAR, camera, and radar data with automated, annotation-free 3D traversability labels, alongside a new benchmark for voxel-level traversability prediction.

Konyul Park, Daehun Kim, Jiyong Oh, Seunghoon Yu, Junseo Park, Jaehyun Park, Hongjae Shin, Hyungchan Cho, Jungho Kim, Jun Won ChoiWed, 11 Ma💻 cs

SPAN-Nav: Generalized Spatial Awareness for Versatile Vision-Language Navigation

SPAN-Nav is an end-to-end foundation model that achieves state-of-the-art, robust generalization in versatile vision-language navigation by leveraging a massive 4.2-million-annotation dataset to learn universal 3D spatial priors, which are efficiently encoded into a single token to guide action reasoning across diverse indoor and outdoor environments.

Jiahang Liu, Tianyu Xu, Jiawei Chen, Lu Yue, Jiazhao Zhang, Zhiyong Wang, Minghan Li, Qisheng Zhao, Anqi Li, Qi Su, Zhizheng Zhang, He WangWed, 11 Ma💻 cs

Provably Safe Trajectory Generation for Manipulators Under Motion and Environmental Uncertainties

This paper presents a provably safe motion planning framework for robot manipulators in uncertain, non-convex environments by integrating a deep stochastic Koopman operator for state prediction with a hierarchical sum-of-squares verification filter within a Model Predictive Path Integral controller to generate certified, collision-free trajectories.

Fei Meng, Zijiang Yang, Xinyu Mao, Haobo Liang, Max Q. -H. MengWed, 11 Ma💻 cs

3D UAV Trajectory Estimation and Classification from Internet Videos via Language Model

This paper presents a novel, annotation-free framework that leverages language models and vision-language reasoning to autonomously extract 3D UAV trajectories and classifications from Internet-scale videos, demonstrating that zero-shot transfer performance on anti-UAV tasks improves consistently with increased data volume without requiring target-domain training.

Haoxiang Lei, Daotong Wang, Shenghai Yuan, Jianbo SuWed, 11 Ma💻 cs

Cutting the Cord: System Architecture for Low-Cost, GPU-Accelerated Bimanual Mobile Manipulation

This paper presents a low-cost, untethered bimanual mobile manipulator built on the open-source XLeRobot platform with integrated NVIDIA Jetson Orin compute, featuring an optimized mechanical design and a specialized power topology to enable autonomous navigation and vision-based manipulation for under $1300.

Artemis Shaw, Chen Liu, Justin Costa, Rane Gray, Alina Skowronek, Kevin Diaz, Nam Bui, Nikolaus CorrellWed, 11 Ma💻 cs

ImpedanceDiffusion: Diffusion-Based Global Path Planning for UAV Swarm Navigation with Generative Impedance Control

The paper presents ImpedanceDiffusion, a hierarchical framework that combines image-conditioned diffusion models for global path planning with reactive APF tracking and VLM-enhanced variable impedance control to enable safe, high-speed, and adaptive UAV swarm navigation in cluttered indoor environments without explicit map construction.

Faryal Batool, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Roohan Ahmed Khan, Aleksey Fedoseev, Dzmitry TsetserukouWed, 11 Ma💻 cs

SurgCalib: Gaussian Splatting-Based Hand-Eye Calibration for Robot-Assisted Minimally Invasive Surgery

This paper presents SurgCalib, a markerless, Gaussian Splatting-based framework that achieves accurate hand-eye calibration for the da Vinci surgical robot by refining kinematic estimates through a differentiable rendering pipeline, thereby overcoming cable-driven inaccuracies and avoiding the sterility issues associated with traditional fiducial markers.

Zijian Wu, Shuojue Yang, Yu Chung Lee, Eitan Prisman, Yueming Jin, Septimiu E. SalcudeanWed, 11 Ma💻 cs

FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid

The paper introduces FAME, a force-adaptive reinforcement learning framework that enables a full-scale humanoid to robustly maintain balance during bimanual manipulation by conditioning its policy on a learned latent context of joint configurations and estimated interaction forces, thereby significantly expanding its manipulation envelope without requiring wrist force sensors.

Niraj Pudasaini, Yutong Zhang, Jensen Lavering, Alessandro Roncone, Nikolaus CorrellWed, 11 Ma💻 cs

From Demonstrations to Safe Deployment: Path-Consistent Safety Filtering for Diffusion Policies

This paper introduces Path-Consistent Safety Filtering (PACS), a novel approach that ensures formal safety guarantees for diffusion policies in dynamic environments while preserving task success rates by applying set-based reachability analysis to brake trajectories in a manner consistent with the policy's training distribution.

Ralf Römer, Julian Balletshofer, Jakob Thumm, Marco Pavone, Angela P. Schoellig, Matthias AlthoffWed, 11 Ma⚡ eess