cs.CV papers | Gist.Science

Active Inference for Micro-Gesture Recognition: EFE-Guided Temporal Sampling and Adaptive Learning

This paper proposes an active inference-based framework for micro-gesture recognition that utilizes Expected Free Energy-guided temporal sampling and uncertainty-aware adaptive learning to overcome challenges like low amplitude, noise, and inter-subject variability, demonstrating significant performance improvements on the SMG dataset.

Weijia Feng, Jingyu Yang, Ruojia Zhang, Fengtao Sun, Qian Gao, Chenyang Wang, Tongtong Su, Jia Guo, Xiaobai Li, Minglai Shao2026-03-10💻 cs

PureCC: Pure Learning for Text-to-Image Concept Customization

PureCC is a novel concept customization framework that employs a decoupled learning objective and a dual-branch training pipeline to achieve high-fidelity text-to-image personalization while effectively preserving the original model's behavior and capabilities.

Zhichao Liao, Xiaole Xian, Qingyu Li, Wenyu Qin, Meng Wang, Weicheng Xie, Siyang Song, Pingfa Feng, Long Zeng, Liang Pan2026-03-10💻 cs

Brain-WM: Brain Glioblastoma World Model

Brain-WM is a pioneering brain glioblastoma world model that utilizes a novel Y-shaped Mixture-of-Transformers architecture to unify next-step treatment prediction and future MRI generation, effectively capturing the co-evolutionary dynamics between tumor progression and treatment response to optimize clinical outcomes.

Chenhui Wang, Boyun Zheng, Liuxin Bao, Zhihao Peng, Peter Y. M. Woo, Hongming Shan, Yixuan Yuan2026-03-10💻 cs

SiamGM: Siamese Geometry-Aware and Motion-Guided Network for Real-Time Satellite Video Object Tracking

The paper proposes SiamGM, a real-time Siamese network for satellite video object tracking that integrates a geometry-aware Inter-Frame Graph Attention module and a motion-guided optimization strategy to effectively address challenges like small targets and occlusions while achieving 130 FPS without computational overhead.

Zixiao Wen, Zhen Yang, Jiawei Li, Xiantai Xiang, Guangyao Zhou, Yuxin Hu, Yuhan Liu2026-03-10💻 cs

GRD-Net: Generative-Reconstructive-Discriminative Anomaly Detection with Region of Interest Attention Module

The paper proposes GRD-Net, a novel architecture combining a generative adversarial network with a region-of-interest attention module to improve industrial surface anomaly detection and localization by learning from normal products and synthetic defects while focusing on relevant areas, thereby reducing reliance on biased post-processing algorithms.

Niccolò Ferrari, Michele Fraccaroli, Evelina Lamma2026-03-10🤖 cs.LG

Efficient RGB-D Scene Understanding via Multi-task Adaptive Learning and Cross-dimensional Feature Guidance

This paper proposes an efficient multi-task RGB-D scene understanding model that integrates an enhanced fusion encoder, specialized feature interaction layers, and a dynamic adaptive loss function to simultaneously perform semantic, instance, and panoptic segmentation, orientation estimation, and scene classification with improved accuracy and speed across multiple datasets.

Guodong Sun, Junjie Liu, Gaoyang Zhang, Bo Wu, Yang Zhang2026-03-10💻 cs

A Systematic Comparison of Training Objectives for Out-of-Distribution Detection in Image Classification

This paper systematically evaluates four training objectives—Cross-Entropy, Prototype, Triplet, and Average Precision Losses—for out-of-distribution detection in image classification, revealing that while they achieve comparable in-distribution accuracy, Cross-Entropy Loss delivers the most consistent performance across both near- and far-OOD scenarios under standardized protocols.

Furkan Genç, Onat Özdemir, Emre Akbas2026-03-10🤖 cs.LG

Integration of deep generative Anomaly Detection algorithm in high-speed industrial line

This paper presents a semi-supervised deep generative anomaly detection framework, utilizing a residual autoencoder with a dense bottleneck, that achieves high-accuracy, real-time defect detection and localization on high-speed pharmaceutical Blow-Fill-Seal production lines while operating within strict 500 ms timing constraints.

Niccolò Ferrari, Nicola Zanarini, Michele Fraccaroli, Alice Bizzarri, Evelina Lamma2026-03-10🤖 cs.LG

3DGS-HPC: Distractor-free 3D Gaussian Splatting with Hybrid Patch-wise Classification

This paper proposes 3DGS-HPC, a novel framework that improves 3D Gaussian Splatting in real-world environments by replacing fragile semantic cues with a robust patch-wise classification strategy and a hybrid metric to effectively identify and suppress transient distractors like moving objects and shadows.

Jiahao Chen, Yipeng Qin, Ganlong Zhao, Xin Li, Wenping Wang, Guanbin Li2026-03-10💻 cs

Models as Lego Builders: Assembling Malice from Benign Blocks via Semantic Blueprints

This paper introduces StructAttack, a black-box jailbreak framework that exploits the semantic slot-filling vulnerability of Large Vision-Language Models by embedding benign-looking visual structures to covertly assemble and generate harmful content.

Chenxi Li, Xianggan Liu, Dake Shen, Yaosong Du, Zhibo Yao, Hao Jiang, Linyi Jiang, Chengwei Cao, Jingzhe Zhang, RanYi Peng, Peiling Bai, Xiande Huang2026-03-10🤖 cs.LG

Fast Attention-Based Simplification of LiDAR Point Clouds for Object Detection and Classification

This paper proposes an efficient, end-to-end learned point cloud simplification method that combines feature embedding with attention-based sampling to achieve a superior balance between computational speed and accuracy for LiDAR-based object detection and classification compared to traditional sampling techniques.

Z. Rozsa, Á. Madaras, Q. Wei, X. Lu, M. Golarits, H. Yuan, T. Sziranyi, R. Hamzaoui2026-03-10💻 cs

EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation

EmbedTalk introduces a triplane-free talking head synthesis method that leverages learned embeddings to drive 3D Gaussian deformations, achieving superior rendering quality, lip synchronization, and motion consistency while enabling real-time performance (over 60 FPS) on mobile GPUs through significantly more compact models.

Arpita Saggar, Jonathan C. Darling, Duygu Sarikaya, David C. Hogg2026-03-10💻 cs

Looking Into the Water by Unsupervised Learning of the Surface Shape

This paper proposes an unsupervised deep learning method using two neural-field networks with periodic activation functions to model water surface height and reconstruct undistorted underwater images from aerial views, outperforming existing approaches on both simulated and real data.

Ori Lifschitz, Tali Treibitz, Dan Rosenbaum2026-03-10💻 cs

Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models

This paper proposes a novel visual representation framework that encodes signals as functions parametrized by low-rank adaptations on frozen diffusion models, enabling compact storage via single-vector hashing and bridging visual compression with generation through inference-time scaling and control.

Jiajun He, Zongyu Guo, Zhaoyang Jia, Xiaoyi Zhang, Jiahao Li, Xiao Li, Bin Li, José Miguel Hernández-Lobato, Yan Lu2026-03-10🤖 cs.LG

Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models

This paper identifies "overthinking"—the propagation of incorrect intermediate hypotheses across decoder layers—as a primary cause of hallucinations in Vision Language Models and introduces the Overthinking Score, a layer-probing metric that significantly outperforms existing final-output-based detectors.

Abin Shoby, Ta Duc Huy, Tuan Dung Nguyen, Minh Khoi Ho, Qi Chen, Anton van den Hengel, Phi Le Nguyen, Johan W. Verjans, Vu Minh Hieu Phan2026-03-10💻 cs

Duala: Dual-Level Alignment of Subjects and Stimuli for Cross-Subject fMRI Decoding

The paper proposes Duala, a dual-level alignment framework that enhances cross-subject fMRI decoding by ensuring semantic consistency at the stimulus level and capturing individual neural variations at the subject level, thereby achieving state-of-the-art performance in image-to-brain retrieval and reconstruction with minimal adaptation data.

Shumeng Li, Jintao Guo, Jian Zhang, Yulin Zhou, Luyang Cao, Yinghuan Shi2026-03-10💻 cs

Real-Time Glottis Detection Framework via Spatial-decoupled Feature Learning for Nasal Transnasal Intubation

This paper proposes Mobile GlottisNet, a lightweight and efficient deep learning framework utilizing spatial-decoupled feature learning and adaptive mechanisms to achieve real-time, high-speed glottis detection for nasotracheal intubation on resource-constrained edge devices.

Jinyu Liu, Gaoyang Zhang, Yang Zhou, Ruoyi Hao, Yang Zhang, Hongliang Ren2026-03-10💻 cs

Evaluating Synthetic Data for Baggage Trolley Detection in Airport Logistics

This paper proposes a high-fidelity synthetic data generation pipeline using NVIDIA Omniverse to address data scarcity and privacy constraints in airport logistics, demonstrating that mixed training with synthetic data and only 40% of real annotations achieves performance comparable to full real-data baselines while reducing annotation effort by 25–35%.

Abdeldjalil Taibi, Mohmoud Badlis, Amina Bensalem, Belkacem Zouilekh, Mohammed Brahimi2026-03-10🤖 cs.LG

AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots

The paper proposes AtomicVLA, a unified planning-and-execution framework that utilizes a Skill-Guided Mixture-of-Experts architecture to dynamically compose atomic skill abstractions, thereby significantly improving scalability and performance in long-horizon robotic manipulation and continual learning tasks compared to existing monolithic VLA models.

Likui Zhang, Tao Tang, Zhihao Zhan, Xiuwei Chen, Zisheng Chen, Jianhua Han, Jiangtong Zhu, Pei Xu, Hang Xu, Hefeng Wu, Liang Lin, Xiaodan Liang2026-03-10💻 cs

GLASS: Graph and Vision-Language Assisted Semantic Shape Correspondence

GLASS is a novel unsupervised framework that establishes dense 3D shape correspondence across challenging non-isometric and inter-class scenarios by integrating geometric spectral analysis with semantic priors from vision-language foundation models, achieving state-of-the-art performance through view-consistent feature extraction, language-injected vertex descriptors, and a graph-assisted contrastive loss.

Qinfeng Xiao, Guofeng Mei, Qilong Liu, Chenyuan Yi, Fabio Poiesi, Jian Zhang, Bo Yang, Yick Kit-lun2026-03-10💻 cs

← Previous Next →