Intrinsic Geometry-Appearance Consistency Optimization for Sparse-View Gaussian Splatting

MVD-HuGaS is a novel framework that achieves state-of-the-art free-view 3D human rendering from a single image by leveraging a fine-tuned multi-view diffusion model to generate consistent multi-view images, an alignment module for joint Gaussian and pose optimization, and a depth-based facial distortion mitigation module to ensure high-fidelity reconstruction.

Kaiqiang Xiong, Rui Peng, Jiahao Wu + 5 more2026-03-04💻 cs

Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement

This paper presents Articulation in Motion (AiM), a prior-free framework that leverages a dual-Gaussian scene representation and sequential RANSAC to automatically segment articulated objects into rigid parts, estimate their kinematics, and reconstruct interactive 3D replicas from a single static scan and an interaction video without requiring prior knowledge of the number of parts.

Hao Ai, Wenjie Chang, Jianbo Jiao + 2 more2026-03-04💻 cs

HDINO: A Concise and Efficient Open-Vocabulary Detector

HDINO is a concise and efficient open-vocabulary detector that eliminates reliance on manually curated datasets and resource-intensive feature extraction by employing a two-stage training strategy with a One-to-Many Semantic Alignment Mechanism and Difficulty Weighted Classification Loss to achieve state-of-the-art performance on COCO with significantly fewer training images than existing methods.

Hao Zhang, Yiqun Wang, Qinran Lin + 2 more2026-03-04💻 cs

TC-Padé: Trajectory-Consistent Padé Approximation for Diffusion Acceleration

TC-Padé is a novel feature prediction framework that leverages Trajectory-Consistent Padé approximation with adaptive coefficient modulation and step-aware strategies to significantly accelerate diffusion models in low-step regimes while maintaining high generation quality and overcoming the trajectory drift limitations of existing polynomial-based methods.

Benlei Cui, Shaoxuan He, Bukun Huang + 8 more2026-03-04💻 cs

TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation

TagaVLM is an end-to-end framework that enhances Vision-Language Navigation by explicitly injecting topological structures into the VLM backbone via Spatial Topology Aware Residual Attention and Interleaved Navigation Prompts, achieving state-of-the-art performance on the R2R benchmark and demonstrating that targeted architectural improvements on smaller models can outperform brute-force scaling for embodied spatial reasoning.

Jiaxing Liu, Zexi Zhang, Xiaoyan Li + 3 more2026-03-04💻 cs