ActivePose: Active 6D Object Pose Estimation and Tracking for Robotic Manipulation

ActivePose proposes an active 6D object pose estimation and tracking framework that integrates a Vision-Language Model with "robotic imagination" to dynamically resolve viewpoint-induced ambiguities through Next-Best-View selection and employs a diffusion-policy for robust camera trajectory control, significantly outperforming classical baselines in both simulation and real-world robotic manipulation tasks.

Sheng Liu, Zhe Li, Weiheng Wang, Han Sun, Heng Zhang, Hongpeng Chen, Yusen Qin, Arash Ajoudani, Yizhao Wang2026-03-10💻 cs

WHU-STree: A Multi-modal Benchmark Dataset for Street Tree Inventory

This paper introduces WHU-STree, a comprehensive, multi-modal benchmark dataset featuring synchronized point clouds and high-resolution images of over 21,000 street trees across two cities, designed to overcome limitations in existing datasets by enabling diverse inventory tasks and advancing research in multi-modal fusion and cross-domain generalization for urban tree management.

Ruifei Ding, Zhe Chen, Wen Fan + 5 more2026-03-10💻 cs

GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model

GeoAware-VLA enhances the viewpoint generalization of Vision-Language-Action models by integrating features from a frozen, pretrained geometric vision model via a lightweight projection layer, achieving significant improvements in zero-shot performance on unseen camera poses across both simulation benchmarks and real-world robotic platforms without requiring explicit 3D training data.

Ali Abouzeid, Malak Mansour, Qinbo Sun, Zezhou Sun, Dezhen Song2026-03-10💻 cs

OIPP: Object-Adaptive Impact Point Predictor for Catching Diverse In-Flight Objects

This paper introduces the Object-Adaptive Impact Point Predictor (OIPP) and a new real-world dataset of 8,000 diverse trajectories to enable quadruped robots with baskets to accurately predict the landing positions of various in-flight objects, even during early flight stages and for unseen objects, thereby significantly improving catching success rates.

Ngoc Huy Nguyen, Kazuki Shibata, Takamitsu Matsubara2026-03-10💻 cs

Efficient Construction of Implicit Surface Models From a Single Image for Motion Generation

This paper introduces Fast Image-to-Neural Surface (FINS), a lightweight framework that efficiently reconstructs high-fidelity implicit surfaces and SDF fields from a single image within seconds by leveraging multi-resolution hash grids and pre-trained foundation models, outperforming existing methods in speed and accuracy for robotics applications.

Wei-Teng Chu, Tianyi Zhang, Matthew Johnson-Roberson, Weiming Zhi2026-03-10💻 cs

Quantized Visual Geometry Grounded Transformer

This paper introduces QuantVGGT, the first quantization framework for billion-scale Visual Geometry Grounded Transformers (VGGTs), which overcomes unique calibration and distribution challenges through Dual-Smoothed Fine-Grained Quantization and Noise-Filtered Diverse Sampling to achieve significant memory and speedup gains while maintaining high reconstruction accuracy.

Weilun Feng, Haotong Qin, Mingqiang Wu, Chuanguang Yang, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Yulun Zhang, Michele Magno, Yongjun Xu2026-03-10💻 cs

Autonomous UAV-Quadruped Docking in Complex Terrains via Active Posture Alignment and Constraint-Aware Control

This paper presents an autonomous docking framework for UAVs and quadruped robots in GPS-denied, complex terrains, utilizing a deep reinforcement learning-based posture stabilization system for the ground robot and a three-phase, constraint-aware control strategy for the UAV to achieve successful landings on steep slopes and uneven surfaces.

Haozhe Xu, Cheng Cheng, Hongrui Sang, Zhipeng Wang, Qiyong He, Xiuxian Li, Bin He2026-03-10💻 cs