Let's Reward Step-by-Step: Step-Aware Contrastive Alignment for Vision-Language Navigation in Continuous Environments

This paper introduces Step-Aware Contrastive Alignment (SACA), a novel framework that enhances Vision-Language Navigation in Continuous Environments by utilizing a perception-grounded auditor to extract dense, step-level supervision from imperfect trajectories, thereby overcoming the limitations of compounding errors in supervised fine-tuning and sparse rewards in reinforcement fine-tuning to achieve state-of-the-art performance.

Haoyuan Li, Rui Liu, Hehe Fan, Yi YangWed, 11 Ma💻 cs

Robotic Scene Cloning:Advancing Zero-Shot Robotic Scene Adaptation in Manipulation via Visual Prompt Editing

This paper introduces Robotic Scene Cloning (RSC), a novel method that enhances zero-shot robotic manipulation by editing existing operation trajectories through visual prompting and condition injection to generate accurate, scene-consistent samples that significantly improve policy generalization in real-world environments.

Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Tiancai Wang, Chang Wen Chen, Haoqiang Fan, Zhenzhong ChenWed, 11 Ma💻 cs

OTPL-VIO: Robust Visual-Inertial Odometry with Optimal Transport Line Association and Adaptive Uncertainty

This paper presents OTPL-VIO, a robust stereo visual-inertial odometry system that enhances performance in low-texture and illumination-challenging environments by employing a training-free deep descriptor with entropy-regularized optimal transport for line association and introducing adaptive uncertainty weighting to stabilize estimation.

Zikun Chen, Wentao Zhao, Yihe Niu, Tianchen Deng, Jingchuan WangWed, 11 Ma💻 cs

ReTac-ACT: A State-Gated Vision-Tactile Fusion Transformer for Precision Assembly

ReTac-ACT is a state-gated vision-tactile fusion transformer that achieves high-precision assembly in occluded, contact-rich environments by dynamically prioritizing tactile feedback through bidirectional cross-attention and proprioception-conditioned gating, outperforming vision-only baselines on the NIST Assembly Task Board M1 benchmark.

Minchi Ruan, LiangQing Zhou, Hongtong Li, Zongtao Wang, ZhaoMing Lu, Jianwei Zhang, Bin FangWed, 11 Ma💻 cs

Trajectory Optimization for Self-Wrap-Aware Cable-Towed Planar Object Manipulation under Implicit Tension Constraints

This paper formulates cable-towed planar object manipulation as a routing-aware, tensioning-implicit trajectory optimization problem that leverages self-wrapping to dynamically redirect torque, proposing a relaxation hierarchy where the Implicit-Mode Relaxation (IMR) effectively exploits self-wrap for turning maneuvers without the conservatism of explicit routing decisions.

Yu Li, Amin Fakhari, Hamid SadeghianWed, 11 Ma💻 cs

Beyond Short-Horizon: VQ-Memory for Robust Long-Horizon Manipulation in Non-Markovian Simulation Benchmarks

This paper introduces RuleSafe, a new long-horizon articulated manipulation benchmark featuring non-Markovian safe-unlocking tasks, and proposes VQ-Memory, a vector-quantized temporal representation that significantly enhances the planning, generalization, and efficiency of Vision-Language-Action models in complex robotic simulations.

Wang Honghui, Jing Zhi, Ao Jicong, Song Shiji, Li Xuelong, Huang Gao, Bai ChenjiaWed, 11 Ma💻 cs

SEA-Nav: Efficient Policy Learning for Safe and Agile Quadruped Navigation in Cluttered Environments

The paper introduces SEA-Nav, a reinforcement learning framework that combines differentiable control barrier functions, adaptive collision replay, and kinematic constraints to enable quadruped robots to achieve safe, agile, and efficient navigation in densely cluttered environments with minute-level training time.

Shiyi Chen, Mingye Yang, Haiyan Mao, Jiaqi Zhang, Haiyi Liu, Shuheng He, Debing Zhang, Zihao Qiu, Chun ZhangWed, 11 Ma💻 cs

Stein Variational Ergodic Surface Coverage with SE(3) Constraints

This paper introduces a preconditioned SE(3) Stein Variational Gradient Descent framework that reformulates point-cloud surface coverage as a manifold-aware sampling problem, enabling robots to generate high-quality, SE(3)-constrained trajectories that outperform existing optimization-based and sampling-as-optimization methods in both simulation and real-world experiments.

Jiayun Li, Yufeng Jin, Sangli Teng, Dejian Gong, Georgia ChalvatzakiWed, 11 Ma💻 cs

NLiPsCalib: An Efficient Calibration Framework for High-Fidelity 3D Reconstruction of Curved Visuotactile Sensors

The paper presents NLiPsCalib, an efficient and physics-consistent calibration framework that utilizes Near-Light Photometric Stereo and controllable light sources to enable high-fidelity 3D reconstruction of curved visuotactile sensors through simple contacts with everyday objects, thereby overcoming the cost and complexity of existing methods.

Xuhao Qin, Feiyu Zhao, Yatao Leng, Runze Hu, Chenxi XiaoWed, 11 Ma💻 cs

See, Plan, Rewind: Progress-Aware Vision-Language-Action Models for Robust Robotic Manipulation

The paper introduces See, Plan, Rewind (SPR), a progress-aware vision-language-action framework that enhances robotic manipulation robustness by dynamically grounding instructions into spatial subgoals and enabling closed-loop error recovery through state rewinding, achieving state-of-the-art performance on challenging benchmarks without additional training.

Tingjun Dai, Mingfei Han, Tingwen Du, Zhiheng Liu, Zhihui Li, Salman Khan, Jun Yu, Xiaojun ChangWed, 11 Ma💻 cs