Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding

The paper introduces SpecTemp, a reinforcement learning-based framework that enhances the efficiency of long video understanding by decoupling temporal perception and reasoning through a cooperative dual-model design, where a lightweight draft MLLM proposes salient frames for verification by a powerful target MLLM, thereby significantly accelerating inference while maintaining competitive accuracy.

Pengfei Hu, Meng Cao, Yingyao Wang + 6 more2026-03-02💻 cs

ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving

ColaVLA is a unified vision-language-action framework that addresses the latency and modality mismatch of existing VLM-based planners by transferring cognitive reasoning into a compact latent space and employing a hierarchical parallel decoder to achieve state-of-the-art, efficient, and safe trajectory planning on the nuScenes benchmark.

Qihang Peng, Xuesong Chen, Chenye Yang + 2 more2026-03-02💻 cs

CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting

CPiRi is a novel framework for multivariate time series forecasting that combines a spatio-temporal decoupling architecture with permutation-invariant regularization to overcome the limitations of existing channel-dependent and independent models, achieving state-of-the-art performance, robustness to channel reordering, and strong inductive generalization to unseen channels.

Jiyuan Xu, Wenyu Zhang, Xin Jing + 3 more2026-03-02💻 cs

One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image

One2Scene is a novel framework that generates geometrically consistent, explorable 3D scenes from a single image by decomposing the task into panorama generation, 3D scaffold construction via multi-view stereo matching on sparse anchor views, and novel view synthesis, thereby overcoming the severe distortions and artifacts common in existing methods during large camera motions.

Pengfei Wang, Liyi Chen, Zhiyuan Ma + 3 more2026-03-02💻 cs