Advancing Complex Video Object Segmentation via Progressive Concept Construction

The paper introduces Segment Concept (SeC), a novel video object segmentation framework that leverages Large Vision-Language Models to progressively construct high-level object-centric representations, achieving state-of-the-art performance on a new Semantic Complex Scenarios benchmark (SeCVOS) by significantly outperforming existing methods like SAM 2.

Zhixiong Zhang, Shuangrui Ding, Xiaoyi Dong + 7 more2026-03-03🤖 cs.AI

ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving

This paper presents ImagiDrive, a unified end-to-end autonomous driving framework that synergistically integrates a Vision-Language Model-based driving agent with a Driving World Model-based scene imaginer to iteratively refine planning decisions through a closed-loop imagination-and-planning process, demonstrating superior robustness and performance on nuScenes and NAVSIM datasets.

Jingyu Li, Bozhou Zhang, Xin Jin + 3 more2026-03-03💻 cs

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

This paper introduces CineTrans, a novel framework that leverages a newly constructed Cine250K dataset and a training-free, mask-based control mechanism derived from attention map analysis to generate coherent, cinematic multi-shot videos with stable, film-style transitions, significantly outperforming existing baselines in transition control and temporal consistency.

Xiaoxue Wu, Bingjie Gao, Yu Qiao + 2 more2026-03-03💻 cs

MOON: Generative MLLM-based Multimodal Representation Learning for E-commerce Product Understanding

This paper introduces MOON, the first generative Multimodal Large Language Model designed for e-commerce product understanding, which leverages guided Mixture-of-Experts, semantic region detection, and specialized negative sampling to overcome existing alignment and noise challenges while establishing a new large-scale benchmark for evaluation.

Daoze Zhang, Chenghan Fu, Zhanheng Nie + 7 more2026-03-03🤖 cs.AI