Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning

The paper introduces AFRO, a self-supervised framework that learns dynamics-aware 3D visual representations by modeling state-action-state transitions via a generative diffusion process, thereby significantly improving robotic manipulation performance across diverse simulated and real-world tasks without requiring explicit action or reconstruction supervision.

Qiwei Liang, Boyang Cai, Minghao Lai, Sitong Zhuang, Tao Lin, Yan Qin, Yixuan Ye, Jiaming Liang, Renjing XuWed, 11 Ma💻 cs

V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs

This paper introduces V-Attack, a novel adversarial attack method for Large Vision-Language Models that achieves precise local semantic manipulation by targeting disentangled value features within transformer attention blocks, thereby overcoming the controllability limitations of existing approaches that rely on entangled patch-token representations.

Sen Nie, Jie Zhang, Jianxin Yan, Shiguang Shan, Xilin ChenWed, 11 Ma💻 cs

Proper Body Landmark Subset Enables More Accurate and 5X Faster Recognition of Isolated Signs in LIBRAS

This paper demonstrates that selecting an optimal subset of body landmarks combined with spline-based imputation enables isolated Brazilian Sign Language (LIBRAS) recognition that is both 5 times faster and as accurate as state-of-the-art methods, overcoming the speed-accuracy trade-off of previous OpenPose-based approaches.

Daniele L. V. dos Santos, Thiago B. Pereira, Carlos Eduardo G. R. Alves, Richard J. M. G. Tello, Francisco de A. Boldt, Thiago M. PaixãoWed, 11 Ma💻 cs

Revisiting Replanning from Scratch: Real-Time Incremental Planning with Fast Almost-Surely Asymptotically Optimal Planners

This paper challenges the conventional assumption that reactive replanning requires updating existing plans by demonstrating that using fast almost-surely asymptotically optimal (ASAO) algorithms to solve a series of independent planning problems offers a more efficient and effective approach for navigating changing environments.

Mitchell E. C. Sabbadini, Andrew H. Liu, Joseph Ruan, Tyler S. Wilson, Zachary Kingston, Jonathan D. GammellWed, 11 Ma💻 cs

LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment

LARA-Gen introduces a framework for continuous, fine-grained emotion control in music generation by aligning latent affective representations with an external emotion predictor and utilizing a valence-arousal control module, thereby overcoming the limitations of text-based prompting and significantly improving both emotional adherence and music quality.

Jiahao Mei, Xuenan Xu, Zeyu Xie, Zihao Zheng, Ye Tao, Yue Ding, Mengyue WuWed, 11 Ma💻 cs

Mapping Historic Urban Footprints in France: Balancing Quality, Scalability and AI Techniques

This study presents a scalable dual-pass deep learning pipeline that successfully extracts the first open-access, nationwide urban footprint dataset for metropolitan France from historical maps (1925–1950), achieving 73% accuracy by effectively mitigating artifacts like text and contour lines to enable quantitative analysis of pre-1970s urban sprawl.

Walid Rabehi, Marion Le Texier, Rémi LemoyWed, 11 Ma💻 cs

LLaVAShield: Safeguarding Multimodal Multi-Turn Dialogues in Vision-Language Models

This paper introduces LLaVAShield, a safety auditing framework for multimodal multi-turn dialogues in Vision-Language Models, supported by the MMDS dataset and MMRT red-teaming framework, which collectively address the limitations of existing moderation tools by effectively detecting concealed malicious intent, contextual risk accumulation, and cross-modal joint risks.

Guolei Huang, Qinzhi Peng, Gan Xu, Yao Huang, Yuxuan Lu, Yongjun ShenWed, 11 Ma💻 cs

Automated Coral Spawn Monitoring for Reef Restoration: The Coral Spawn and Larvae Imaging Camera System (CSLICS)

This paper introduces the Coral Spawn and Larvae Imaging Camera System (CSLICS), an automated, low-cost computer vision solution that significantly reduces labor-intensive manual counting while accurately monitoring coral spawn and larvae to enhance reef restoration efforts.

Dorian Tsai, Christopher A. Brunner, Riki Lamont, F. Mikaela Nordborg, Andrea Severati, Java Terry, Karen Jackel, Matthew Dunbabin, Tobias Fischer, Scarlett RaineWed, 11 Ma💻 cs