Quantized Visual Geometry Grounded Transformer

This paper introduces QuantVGGT, the first quantization framework for billion-scale Visual Geometry Grounded Transformers (VGGTs), which overcomes unique calibration and distribution challenges through Dual-Smoothed Fine-Grained Quantization and Noise-Filtered Diverse Sampling to achieve significant memory and speedup gains while maintaining high reconstruction accuracy.

Weilun Feng, Haotong Qin, Mingqiang Wu, Chuanguang Yang, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Yulun Zhang, Michele Magno, Yongjun Xu2026-03-10💻 cs

Autonomous UAV-Quadruped Docking in Complex Terrains via Active Posture Alignment and Constraint-Aware Control

This paper presents an autonomous docking framework for UAVs and quadruped robots in GPS-denied, complex terrains, utilizing a deep reinforcement learning-based posture stabilization system for the ground robot and a three-phase, constraint-aware control strategy for the UAV to achieve successful landings on steep slopes and uneven surfaces.

Haozhe Xu, Cheng Cheng, Hongrui Sang, Zhipeng Wang, Qiyong He, Xiuxian Li, Bin He2026-03-10💻 cs

SAC-Loco: Safe and Adjustable Compliant Quadrupedal Locomotion

This paper proposes SAC-Loco, a safety-aware framework that integrates a teacher-student reinforcement learning approach for adjustable force compliance with a safety-oriented recovery policy and a real-time safety critic, enabling quadruped robots to achieve robust and stable locomotion under diverse external force disturbances without requiring explicit force sensing.

Aoqian Zhang, Zixuan Zhuang, Chunzheng Wang, Shuzhi Sam Ge, Fan Shi, Cheng Xiang2026-03-10💻 cs

Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models

This paper introduces FAMDA, a simple yet effective unsupervised domain adaptation framework that leverages Vision Foundation Models as teachers within a self-training paradigm to generate high-quality pseudo-labels, enabling the training of highly efficient student networks that achieve state-of-the-art performance in multi-task dense prediction for resource-constrained robotics applications.

Beomseok Kang, Niluthpol Chowdhury Mithun, Mikhail Sizintsev, Han-Pang Chiu, Supun Samarasekera2026-03-10💻 cs

QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification

QuantSparse is a unified framework that effectively combines model quantization and attention sparsification for video diffusion transformers by introducing Multi-Scale Salient Attention Distillation and Second-Order Sparse Attention Reparameterization to mitigate information loss, thereby achieving significant storage reduction and inference acceleration while substantially outperforming existing baselines in generation quality.

Weilun Feng, Chuanguang Yang, Haotong Qin, Mingqiang Wu, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Yulun Zhang, Michele Magno, Yongjun Xu2026-03-10💻 cs

PHASE-Net: Physics-Grounded Harmonic Attention System for Efficient Remote Photoplethysmography Measurement

This paper introduces PHASE-Net, a lightweight and theoretically grounded remote photoplethysmography model that leverages hemodynamic principles to derive a causal Temporal Convolutional Network, enhanced by novel spatial mixing and filtering modules to achieve state-of-the-art accuracy and efficiency in non-contact physiological monitoring under challenging conditions.

Bo Zhao, Dan Guo, Junzhe Cao, Yong Xu, Bochao Zou, Tao Tan, Yue Sun, Zitong Yu2026-03-10💻 cs

LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology

This paper introduces LMOD+, a large-scale multimodal ophthalmology benchmark dataset and evaluation framework featuring 32,633 annotated instances across 12 conditions and 5 imaging modalities, designed to advance and systematically assess the capabilities of multimodal large language models in vision-threatening disease diagnosis, staging, and bias detection.

Zhenyue Qin, Yang Liu, Yu Yin, Jinyu Ding, Haoran Zhang, Anran Li, Dylan Campbell, Xuansheng Wu, Ke Zou, Tiarnan D. L. Keenan, Emily Y. Chew, Zhiyong Lu, Yih Chung Tham, Ninghao Liu, Xiuzhen Zhang, Qingyu Chen2026-03-10💻 cs

XPPG-PCA: Reference-free automatic speech severity evaluation with principal components

This paper introduces XPPG-PCA, a novel unsupervised and reference-free method for objectively evaluating speech pathology severity that overcomes the limitations of existing automated approaches by demonstrating robust, generalizable performance comparable to or exceeding established reference-based methods across multiple datasets.

Bence Mark Halpern, Thomas B. Tienkamp, Teja Rebernik + 5 more2026-03-10💻 cs

Beyond Collision Cones: Dynamic Obstacle Avoidance for Nonholonomic Robots via Dynamic Parabolic Control Barrier Functions

This paper introduces a Dynamic Parabolic Control Barrier Function (DPCBF) that adaptively shapes safety constraints based on distance and relative velocity to overcome the infeasibility and conservativeness of traditional collision-cone methods, enabling nonholonomic robots to successfully navigate dense environments with up to 100 dynamic obstacles.

Hun Kuk Park, Taekyung Kim, Dimitra Panagou2026-03-10💻 cs

Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!

The paper introduces REVEL, a new task for streaming, fine-grained interactive video manipulation on any object at any time, and proposes DragStream, a training-free method that resolves latent distribution drift and context interference in autoregressive video diffusion models through adaptive distribution self-rectification and spatial-frequency selective optimization.

Junbao Zhou, Yuan Zhou, Kesen Zhao, Qingshan Xu, Beier Zhu, Richang Hong, Hanwang Zhang2026-03-10💻 cs

PAD-TRO: Projection-Augmented Diffusion for Direct Trajectory Optimization

This paper introduces PAD-TRO, a novel direct trajectory optimization framework that integrates a gradient-free projection mechanism into the reverse diffusion process to generate dynamically feasible state sequences, achieving zero dynamic feasibility errors and a significantly higher success rate in complex quadrotor navigation compared to existing single-shooting approaches.

Jushan Chen, Santiago Paternain2026-03-10💻 cs