DOCFORGE-BENCH: A Comprehensive 0-shot Benchmark for Document Forgery Detection and Analysis

DOCFORGE-BENCH introduces the first unified zero-shot benchmark for document forgery detection, revealing that current methods suffer from severe calibration failures due to the extreme rarity of tampered pixels in documents, which renders standard fixed thresholds ineffective and highlights threshold adaptation as the critical missing step for practical deployment.

Zengqi Zhao, Weidi Xia, En Wei, Yan Zhang, Jane Mo, Tiannan Zhang, Yuanqin Dai, Zexi Chen, Yiran Tao, Simiao RenWed, 11 Ma💻 cs

Pathwise Test-Time Correction for Autoregressive Long Video Generation

This paper introduces Test-Time Correction (TTC), a training-free method that stabilizes long-sequence video generation in distilled autoregressive models by using the initial frame as a reference anchor to calibrate intermediate states, thereby overcoming error accumulation and extending generation lengths without compromising quality.

Xunzhi Xiang, Zixuan Duan, Guiyu Zhang, Haiyu Zhang, Zhe Gao, Junta Wu, Shaofeng Zhang, Tengfei Wang, Qi Fan, Chunchao GuoWed, 11 Ma💻 cs

Low-rank Orthogonal Subspace Intervention for Generalizable Face Forgery Detection

To overcome the generalization failure of vanilla CLIP in face forgery detection caused by "low-rank spurious bias," this paper proposes SeLop, a causal representation learning method that identifies and removes spurious correlations via orthogonal low-rank subspace intervention, thereby achieving state-of-the-art performance with high robustness using only 0.39M trainable parameters.

Chi Wang, Xinjue Hu, Boyu Wang, Ziwen He, Zhangjie FuWed, 11 Ma💻 cs

Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning

This paper introduces Directional Decoupling Alignment (D2^2-Align), a novel framework that mitigates Preference Mode Collapse in diffusion reinforcement learning by applying directional corrections to reward signals, thereby preserving generative diversity while achieving superior human preference alignment.

Chubin Chen, Sujie Hu, Jiashu Zhu, Meiqi Wu, Jintao Chen, Yanxun Li, Nisha Huang, Chengyu Fang, Jiahong Wu, Xiangxiang Chu, Xiu LiWed, 11 Ma💻 cs

AVGGT: Rethinking Global Attention for Accelerating VGGT

This paper introduces AVGGT, a training-free acceleration framework that leverages an analysis of global attention's distinct roles in VGGT and π3\pi^3 to implement a two-step optimization strategy, achieving up to 10×\times inference speedup on long sequences while maintaining or improving accuracy in dense multi-view 3D reconstruction tasks.

Xianbing Sun, Zhikai Zhu, Zhengyu Lou, Bo Yang, Jinyang Tang, Liqing Zhang, He Wang, Jianfu ZhangWed, 11 Ma💻 cs

Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning

The paper introduces AFRO, a self-supervised framework that learns dynamics-aware 3D visual representations by modeling state-action-state transitions via a generative diffusion process, thereby significantly improving robotic manipulation performance across diverse simulated and real-world tasks without requiring explicit action or reconstruction supervision.

Qiwei Liang, Boyang Cai, Minghao Lai, Sitong Zhuang, Tao Lin, Yan Qin, Yixuan Ye, Jiaming Liang, Renjing XuWed, 11 Ma💻 cs

V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs

This paper introduces V-Attack, a novel adversarial attack method for Large Vision-Language Models that achieves precise local semantic manipulation by targeting disentangled value features within transformer attention blocks, thereby overcoming the controllability limitations of existing approaches that rely on entangled patch-token representations.

Sen Nie, Jie Zhang, Jianxin Yan, Shiguang Shan, Xilin ChenWed, 11 Ma💻 cs

Proper Body Landmark Subset Enables More Accurate and 5X Faster Recognition of Isolated Signs in LIBRAS

This paper demonstrates that selecting an optimal subset of body landmarks combined with spline-based imputation enables isolated Brazilian Sign Language (LIBRAS) recognition that is both 5 times faster and as accurate as state-of-the-art methods, overcoming the speed-accuracy trade-off of previous OpenPose-based approaches.

Daniele L. V. dos Santos, Thiago B. Pereira, Carlos Eduardo G. R. Alves, Richard J. M. G. Tello, Francisco de A. Boldt, Thiago M. PaixãoWed, 11 Ma💻 cs

Mapping Historic Urban Footprints in France: Balancing Quality, Scalability and AI Techniques

This study presents a scalable dual-pass deep learning pipeline that successfully extracts the first open-access, nationwide urban footprint dataset for metropolitan France from historical maps (1925–1950), achieving 73% accuracy by effectively mitigating artifacts like text and contour lines to enable quantitative analysis of pre-1970s urban sprawl.

Walid Rabehi, Marion Le Texier, Rémi LemoyWed, 11 Ma💻 cs

LLaVAShield: Safeguarding Multimodal Multi-Turn Dialogues in Vision-Language Models

This paper introduces LLaVAShield, a safety auditing framework for multimodal multi-turn dialogues in Vision-Language Models, supported by the MMDS dataset and MMRT red-teaming framework, which collectively address the limitations of existing moderation tools by effectively detecting concealed malicious intent, contextual risk accumulation, and cross-modal joint risks.

Guolei Huang, Qinzhi Peng, Gan Xu, Yao Huang, Yuxuan Lu, Yongjun ShenWed, 11 Ma💻 cs

Automated Coral Spawn Monitoring for Reef Restoration: The Coral Spawn and Larvae Imaging Camera System (CSLICS)

This paper introduces the Coral Spawn and Larvae Imaging Camera System (CSLICS), an automated, low-cost computer vision solution that significantly reduces labor-intensive manual counting while accurately monitoring coral spawn and larvae to enhance reef restoration efforts.

Dorian Tsai, Christopher A. Brunner, Riki Lamont, F. Mikaela Nordborg, Andrea Severati, Java Terry, Karen Jackel, Matthew Dunbabin, Tobias Fischer, Scarlett RaineWed, 11 Ma💻 cs