Integration of deep generative Anomaly Detection algorithm in high-speed industrial line

This paper presents a semi-supervised deep generative anomaly detection framework, utilizing a residual autoencoder with a dense bottleneck, that achieves high-accuracy, real-time defect detection and localization on high-speed pharmaceutical Blow-Fill-Seal production lines while operating within strict 500 ms timing constraints.

Niccolò Ferrari, Nicola Zanarini, Michele Fraccaroli, Alice Bizzarri, Evelina Lamma2026-03-10🤖 cs.LG

Models as Lego Builders: Assembling Malice from Benign Blocks via Semantic Blueprints

This paper introduces StructAttack, a black-box jailbreak framework that exploits the semantic slot-filling vulnerability of Large Vision-Language Models by embedding benign-looking visual structures to covertly assemble and generate harmful content.

Chenxi Li, Xianggan Liu, Dake Shen, Yaosong Du, Zhibo Yao, Hao Jiang, Linyi Jiang, Chengwei Cao, Jingzhe Zhang, RanYi Peng, Peiling Bai, Xiande Huang2026-03-10🤖 cs.LG

Fast Attention-Based Simplification of LiDAR Point Clouds for Object Detection and Classification

This paper proposes an efficient, end-to-end learned point cloud simplification method that combines feature embedding with attention-based sampling to achieve a superior balance between computational speed and accuracy for LiDAR-based object detection and classification compared to traditional sampling techniques.

Z. Rozsa, Á. Madaras, Q. Wei, X. Lu, M. Golarits, H. Yuan, T. Sziranyi, R. Hamzaoui2026-03-10💻 cs

Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models

This paper proposes a novel visual representation framework that encodes signals as functions parametrized by low-rank adaptations on frozen diffusion models, enabling compact storage via single-vector hashing and bridging visual compression with generation through inference-time scaling and control.

Jiajun He, Zongyu Guo, Zhaoyang Jia, Xiaoyi Zhang, Jiahao Li, Xiao Li, Bin Li, José Miguel Hernández-Lobato, Yan Lu2026-03-10🤖 cs.LG

Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models

This paper identifies "overthinking"—the propagation of incorrect intermediate hypotheses across decoder layers—as a primary cause of hallucinations in Vision Language Models and introduces the Overthinking Score, a layer-probing metric that significantly outperforms existing final-output-based detectors.

Abin Shoby, Ta Duc Huy, Tuan Dung Nguyen, Minh Khoi Ho, Qi Chen, Anton van den Hengel, Phi Le Nguyen, Johan W. Verjans, Vu Minh Hieu Phan2026-03-10💻 cs

Duala: Dual-Level Alignment of Subjects and Stimuli for Cross-Subject fMRI Decoding

The paper proposes Duala, a dual-level alignment framework that enhances cross-subject fMRI decoding by ensuring semantic consistency at the stimulus level and capturing individual neural variations at the subject level, thereby achieving state-of-the-art performance in image-to-brain retrieval and reconstruction with minimal adaptation data.

Shumeng Li, Jintao Guo, Jian Zhang, Yulin Zhou, Luyang Cao, Yinghuan Shi2026-03-10💻 cs

Evaluating Synthetic Data for Baggage Trolley Detection in Airport Logistics

This paper proposes a high-fidelity synthetic data generation pipeline using NVIDIA Omniverse to address data scarcity and privacy constraints in airport logistics, demonstrating that mixed training with synthetic data and only 40% of real annotations achieves performance comparable to full real-data baselines while reducing annotation effort by 25–35%.

Abdeldjalil Taibi, Mohmoud Badlis, Amina Bensalem, Belkacem Zouilekh, Mohammed Brahimi2026-03-10🤖 cs.LG

AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots

The paper proposes AtomicVLA, a unified planning-and-execution framework that utilizes a Skill-Guided Mixture-of-Experts architecture to dynamically compose atomic skill abstractions, thereby significantly improving scalability and performance in long-horizon robotic manipulation and continual learning tasks compared to existing monolithic VLA models.

Likui Zhang, Tao Tang, Zhihao Zhan, Xiuwei Chen, Zisheng Chen, Jianhua Han, Jiangtong Zhu, Pei Xu, Hang Xu, Hefeng Wu, Liang Lin, Xiaodan Liang2026-03-10💻 cs

GLASS: Graph and Vision-Language Assisted Semantic Shape Correspondence

GLASS is a novel unsupervised framework that establishes dense 3D shape correspondence across challenging non-isometric and inter-class scenarios by integrating geometric spectral analysis with semantic priors from vision-language foundation models, achieving state-of-the-art performance through view-consistent feature extraction, language-injected vertex descriptors, and a graph-assisted contrastive loss.

Qinfeng Xiao, Guofeng Mei, Qilong Liu, Chenyuan Yi, Fabio Poiesi, Jian Zhang, Bo Yang, Yick Kit-lun2026-03-10💻 cs

Scaling Test-Time Robustness of Vision-Language Models via Self-Critical Inference Framework

This paper proposes a Self-Critical Inference (SCI) framework that enhances the robustness of Large Vision-Language Models against language bias and sensitivity through multi-round counterfactual reasoning with textual and visual perturbations, alongside a new Dynamic Robustness Benchmark (DRBench) for model-specific evaluation.

Kaihua Tang, Jiaxin Qi, Jinli Ou, Yuhua Zheng, Jianqiang Huang2026-03-10💻 cs

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

This paper introduces Holi-Spatial, the first fully automated, large-scale, spatially-aware multimodal dataset constructed from raw video streams without human intervention, which provides 4 million high-quality 3D semantic annotations and spatial QA pairs to significantly enhance the training and performance of Vision-Language Models on spatial reasoning tasks.

Yuanyuan Gao, Hao Li, Yifei Liu, Xinhao Ji, Yuning Gong, Yuanjun Liao, Fangfu Liu, Manyuan Zhang, Yuchen Yang, Dan Xu, Xue Yang, Huaxi Huang, Hongjie Zhang, Ziwei Liu, Xiao Sun, Dingwen Zhang, Zhihang Zhong2026-03-10💻 cs

UniUncer: Unified Dynamic Static Uncertainty for End to End Driving

UniUncer is a lightweight, unified framework for end-to-end autonomous driving that jointly estimates and leverages uncertainty for both static map elements and dynamic agents through probabilistic regression, uncertainty-aware query fusion, and adaptive gating, thereby significantly improving trajectory accuracy and planning robustness with minimal computational overhead.

Yu Gao, Jijun Wang, Zongzheng Zhang, Anqing Jiang, Yiru Wang, Yuwen Heng, Shuo Wang, Hao Sun, Zhangfeng Hu, Hao Zhao2026-03-10💻 cs