cs.CV papers | Gist.Science

Point-Supervised Skeleton-Based Human Action Segmentation

This paper introduces a point-supervised framework for skeleton-based human action segmentation that leverages multimodal features and a novel pseudo-labeling strategy to achieve competitive performance with significantly reduced annotation costs compared to fully-supervised methods.

Hongsong Wang, Yiqin Shen, Pengbo Yan, Jie Gui2026-03-09💻 cs

VG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction

VG3S is a novel framework that enhances 3D semantic occupancy prediction for autonomous driving by integrating strong geometric priors from frozen Vision Foundation Models into Gaussian splatting via a hierarchical feature adapter, achieving significant performance gains on the nuScenes benchmark.

Xiaoyang Yan, Muleilan Pei, Shaojie Shen2026-03-09💻 cs

Cut to the Chase: Training-free Multimodal Summarization via Chain-of-Events

The paper introduces CoE, a training-free multimodal summarization framework that leverages a Hierarchical Event Graph to guide a Chain-of-Events reasoning process, effectively addressing limitations in cross-modal grounding and temporal modeling while achieving state-of-the-art performance across diverse datasets.

Xiaoxing You, Qiang Huang, Lingyu Li, Xiaojun Chang, Jun Yu2026-03-09🤖 cs.AI

EntON: Eigenentropy-Optimized Neighborhood Densification in 3D Gaussian Splatting

The paper introduces EntON, a novel 3D Gaussian Splatting method that employs an Eigenentropy-aware alternating densification strategy to simultaneously improve geometric accuracy and rendering quality while significantly reducing the number of Gaussians and training time.

Miriam Jäger, Boris Jutzi2026-03-09💻 cs

Word-Anchored Temporal Forgery Localization

This paper introduces Word-Anchored Temporal Forgery Localization (WAFL), a novel paradigm that shifts forgery detection from continuous regression to discrete word-level classification by aligning with linguistic boundaries, employing a forensic feature realignment module for efficient feature mapping, and utilizing an artifact-centric asymmetric loss to overcome class imbalance, thereby achieving superior localization performance with significantly reduced computational costs.

Tianyi Wang, Xi Shao, Harry Cheng, Yinglong Wang, Mohan Kankanhalli2026-03-09💻 cs

Low-latency Event-based Object Detection with Spatially-Sparse Linear Attention

This paper introduces Spatially-Sparse Linear Attention (SSLA) and its application in the SSLA-Det model, an end-to-end asynchronous framework for event-based object detection that achieves state-of-the-art accuracy while significantly reducing per-event computation by leveraging state-level sparsity and parallel training.

Haiqing Hao, Zhipeng Sui, Rong Zou, Zijia Dai, Nikola Zubic, Davide Scaramuzza, Wenhui Wang2026-03-09💻 cs

TaPD: Temporal-adaptive Progressive Distillation for Observation-Adaptive Trajectory Forecasting in Autonomous Driving

TaPD is a unified, plug-and-play framework that employs temporal-adaptive progressive distillation and a temporal backfilling module to enable robust trajectory forecasting under variable and extremely short observation histories by transferring knowledge from long-horizon teachers and reconstructing missing past context.

Mingyu Fan, Yi Liu, Hao Zhou, Deheng Qian, Mohammad Haziq Khan, Matthias Raetsch2026-03-09🤖 cs.AI

DC-Merge: Improving Model Merging with Directional Consistency

DC-Merge is a novel model merging method that achieves state-of-the-art performance by first balancing the energy distribution of task vectors through singular value smoothing and then aligning their directional geometries via projection onto a shared orthogonal subspace to preserve multi-task knowledge.

Han-Chen Zhang, Zi-Hao Zhou, Mao-Lin Luo, Shimin Di, Min-Ling Zhang, Tong Wei2026-03-09🤖 cs.LG

Hierarchical Collaborative Fusion for 3D Instance-aware Referring Expression Segmentation

The paper introduces HCF-RES, a novel multi-modal framework that achieves state-of-the-art performance in 3D Generalized Referring Expression Segmentation by leveraging hierarchical visual semantic decomposition with SAM and CLIP, alongside progressive multi-level fusion to effectively integrate 2D semantic and 3D geometric features.

Keshen Zhou, Runnan Chen, Mingming Gong, Tongliang Liu2026-03-09💻 cs

NOVA: Next-step Open-Vocabulary Autoregression for 3D Multi-Object Tracking in Autonomous Driving

NOVA introduces a novel 3D multi-object tracking framework that leverages large language models to reformulate tracking as an autoregressive next-step sequence completion task, enabling superior open-vocabulary generalization and identity consistency through spatio-temporal semantic modeling without relying on closed-set assumptions.

Kai Luo, Xu Wang, Rui Fan, Kailun Yang2026-03-09💻 cs

GazeMoE: Perception of Gaze Target with Mixture-of-Experts

GazeMoE is a novel end-to-end framework that leverages Mixture-of-Experts modules to adaptively fuse multi-modal cues from a frozen vision foundation model, achieving state-of-the-art performance in human gaze target estimation by addressing class imbalance and enhancing robustness through specialized loss functions and data augmentation.

Zhuangzhuang Dai, Zhongxi Lu, Vincent G. Zakka, Luis J. Manso, Jose M Alcaraz Calero, Chen Li2026-03-09🤖 cs.AI

ODD-SEC: Onboard Drone Detection with a Spinning Event Camera

This paper presents ODD-SEC, a real-time onboard drone detection system for moving carriers that utilizes a spinning event camera and a novel motion-compensation-free representation to achieve 360-degree surveillance with high accuracy under challenging conditions.

Kuan Dai, Hongxin Zhang, Sheng Zhong, Yi Zhou2026-03-09💻 cs

HiPP-Prune: Hierarchical Preference-Conditioned Structured Pruning for Vision-Language Models

HiPP-Prune is a hierarchical preference-conditioned structured pruning framework for vision-language models that leverages visual sensitivity signals and multi-objective Group Relative Policy Optimization to generate controllable pruning plans, effectively balancing task utility, compression, and hallucination robustness.

Lincen Bai, Hedi Tabia, Raul Santos-Rodriguez2026-03-09🤖 cs.AI

Spectral and Trajectory Regularization for Diffusion Transformer Super-Resolution

The paper proposes StrSR, a novel one-step adversarial distillation framework that employs asymmetric discriminative distillation and frequency distribution matching to overcome trajectory mismatches and periodic artifacts, thereby achieving state-of-the-art performance in real-world image super-resolution using Diffusion Transformers.

Jingkai Wang, Yixin Tang, Jue Gong, Jiatong Li, Shu Li, Libo Liu, Jianliang Lan, Yutong Liu, Yulun Zhang2026-03-09💻 cs

Can we Trust Unreliable Voxels? Exploring 3D Semantic Occupancy Prediction under Label Noise

This paper introduces OccNL, the first benchmark for 3D semantic occupancy prediction under label noise, and proposes DPR-Occ, a novel framework that leverages dual-source partial label reasoning to achieve robust performance and prevent catastrophic collapse in noisy 3D voxel spaces where existing 2D noise-robust strategies fail.

Wenxin Li, Kunyu Peng, Di Wen, Junwei Zheng, Jiale Wei, Mengfei Duan, Yuheng Zhang, Rui Fan, Kailun Yang2026-03-09💻 cs

Attribute Distribution Modeling and Semantic-Visual Alignment for Generative Zero-shot Learning

This paper proposes ADiVA, a generative zero-shot learning framework that addresses the class-instance and semantic-visual domain gaps by jointly modeling attribute distributions to capture instance-specific variability and employing visual-guided alignment to refine semantic representations, thereby significantly outperforming state-of-the-art methods on benchmark datasets.

Haojie Pu, Zhuoming Li, Yongbiao Gao, Yuheng Jia2026-03-09💻 cs

FlowMotion: Training-Free Flow Guidance for Video Motion Transfer

FlowMotion is a novel training-free framework that achieves efficient and flexible video motion transfer by directly leveraging early latent predictions from flow-based text-to-video models to extract motion representations and align patterns, while employing velocity regularization to ensure smooth motion evolution.

Zhen Wang, Youcan Xu, Jun Xiao, Long Chen2026-03-09💻 cs

3D CBCT Artefact Removal Using Perpendicular Score-Based Diffusion Models

This paper proposes a novel 3D dental implant inpainting method using perpendicular score-based diffusion models that operate in the projection domain to capture inter-projection correlations, thereby generating high-quality, artifact-reduced CBCT images with improved consistency compared to existing 2D-based approaches.

Susanne Schaub, Florentin Bieder, Matheus L. Oliveira, Yulan Wang, Dorothea Dagassan-Berndt, Michael M. Bornstein, Philippe C. Cattin2026-03-09🤖 cs.LG

DEX-AR: A Dynamic Explainability Method for Autoregressive Vision-Language Models

The paper introduces DEX-AR, a novel dynamic explainability method that generates per-token and sequence-level heatmaps for autoregressive Vision-Language Models by computing layer-wise gradients and employing dynamic filtering to distinguish visually-grounded from linguistic tokens, thereby improving interpretability and performance across multiple benchmarks.

Walid Bousselham, Angie Boggust, Hendrik Strobelt, Hilde Kuehne2026-03-09🤖 cs.AI

Latent Transfer Attack: Adversarial Examples via Generative Latent Spaces

The paper proposes LTA, a transfer-based adversarial attack that optimizes perturbations within the latent space of a pretrained Stable Diffusion VAE to generate spatially coherent, low-frequency examples that achieve superior transferability and robustness against common preprocessing compared to traditional pixel-space methods.

Eitan Shaar, Ariel Shaulov, Yalcin Tur, Gal Chechik, Ravid Shwartz-Ziv2026-03-09💻 cs

← Previous Next →