cs.CV papers | Gist.Science

Attribute Distribution Modeling and Semantic-Visual Alignment for Generative Zero-shot Learning

This paper proposes ADiVA, a generative zero-shot learning framework that addresses the class-instance and semantic-visual domain gaps by jointly modeling attribute distributions to capture instance-specific variability and employing visual-guided alignment to refine semantic representations, thereby significantly outperforming state-of-the-art methods on benchmark datasets.

Haojie Pu, Zhuoming Li, Yongbiao Gao, Yuheng Jia2026-03-09💻 cs

FlowMotion: Training-Free Flow Guidance for Video Motion Transfer

FlowMotion is a novel training-free framework that achieves efficient and flexible video motion transfer by directly leveraging early latent predictions from flow-based text-to-video models to extract motion representations and align patterns, while employing velocity regularization to ensure smooth motion evolution.

Zhen Wang, Youcan Xu, Jun Xiao, Long Chen2026-03-09💻 cs

3D CBCT Artefact Removal Using Perpendicular Score-Based Diffusion Models

This paper proposes a novel 3D dental implant inpainting method using perpendicular score-based diffusion models that operate in the projection domain to capture inter-projection correlations, thereby generating high-quality, artifact-reduced CBCT images with improved consistency compared to existing 2D-based approaches.

Susanne Schaub, Florentin Bieder, Matheus L. Oliveira, Yulan Wang, Dorothea Dagassan-Berndt, Michael M. Bornstein, Philippe C. Cattin2026-03-09🤖 cs.LG

DEX-AR: A Dynamic Explainability Method for Autoregressive Vision-Language Models

The paper introduces DEX-AR, a novel dynamic explainability method that generates per-token and sequence-level heatmaps for autoregressive Vision-Language Models by computing layer-wise gradients and employing dynamic filtering to distinguish visually-grounded from linguistic tokens, thereby improving interpretability and performance across multiple benchmarks.

Walid Bousselham, Angie Boggust, Hendrik Strobelt, Hilde Kuehne2026-03-09🤖 cs.AI

Latent Transfer Attack: Adversarial Examples via Generative Latent Spaces

The paper proposes LTA, a transfer-based adversarial attack that optimizes perturbations within the latent space of a pretrained Stable Diffusion VAE to generate spatially coherent, low-frequency examples that achieve superior transferability and robustness against common preprocessing compared to traditional pixel-space methods.

Eitan Shaar, Ariel Shaulov, Yalcin Tur, Gal Chechik, Ravid Shwartz-Ziv2026-03-09💻 cs

WMoE-CLIP: Wavelet-Enhanced Mixture-of-Experts Prompt Learning for Zero-Shot Anomaly Detection

This paper proposes WMoE-CLIP, a zero-shot anomaly detection method that overcomes the limitations of fixed prompts and spatial-only features by integrating variational autoencoder-based semantic modeling, wavelet decomposition for multi-frequency feature refinement, and a semantic-aware mixture-of-experts module, achieving state-of-the-art performance across 14 industrial and medical datasets.

Peng Chen, Chao Huang2026-03-09💻 cs

P-SLCR: Unsupervised Point Cloud Semantic Segmentation via Prototypes Structure Learning and Consistent Reasoning

This paper introduces P-SLCR, a novel unsupervised point cloud semantic segmentation framework that leverages Consistent Structure Learning and Semantic Relation Consistent Reasoning to achieve state-of-the-art performance on S3DIS, SemanticKITTI, and Scannet datasets, even surpassing the fully supervised PointNet on the S3DIS Area-5 benchmark.

Lixin Zhan, Jie Jiang, Tianjian Zhou, Yukun Du, Yan Zheng, Xuehu Duan2026-03-09💻 cs

The Art That Poses Back: Assessing AI Pastiches after Contemporary Artworks

This study evaluates ChatGPT's ability to generate AI pastiches of contemporary artworks by combining human feedback from twelve international artists with computational analysis, revealing a significant gap between superficial visual similarities and the lack of conceptual depth, dimensionality, and emotional resonance in the AI-generated results, thereby advocating for a multi-metric "style transfer dashboard" for more comprehensive evaluation.

Anca Dinu, Andreiana Mihail, Andra-Maria Florescu, Claudiu Creanga2026-03-09💬 cs.CL

WorldCache: Accelerating World Models for Free via Heterogeneous Token Caching

WorldCache is a novel caching framework that accelerates diffusion-based world models by up to 3.7 $\times$ without retraining, overcoming challenges of token heterogeneity and non-uniform temporal dynamics through curvature-guided prediction and chaotic-prioritized adaptive skipping to maintain high rollout quality.

Weilun Feng, Guoxin Fan, Haotong Qin, Chuanguang Yang, Mingqiang Wu, Yuqi Li, Xiangqi Li, Zhulin An, Libo Huang, Dingrui Wang, Longlong Liao, Michele Magno, Yongjun Xu2026-03-09💻 cs

K-MaT: Knowledge-Anchored Manifold Transport for Cross-Modal Prompt Learning in Medical Imaging

K-MaT is a novel prompt-learning framework that enables the zero-shot transfer of large-scale biomedical vision-language models from high-end to low-end imaging modalities by anchoring prompts to clinical text and aligning their decision manifolds via Fused Gromov-Wasserstein optimal transport, thereby achieving state-of-the-art performance while mitigating catastrophic forgetting.

Jiajun Zeng, Shadi Albarqouni2026-03-09🤖 cs.AI

Dynamic Chunking Diffusion Transformer

The paper introduces the Dynamic Chunking Diffusion Transformer (DC-DiT), a model that adaptively compresses image tokens based on spatial detail and diffusion timesteps to achieve superior generation quality and efficiency over standard DiTs while requiring minimal additional training.

Akash Haridas, Utkarsh Saxena, Parsa Ashrafi Fashi, Mehdi Rezagholizadeh, Vikram Appia, Emad Barsoum2026-03-09🤖 cs.AI

LATO: 3D Mesh Flow Matching with Structured TOpology Preserving LAtents

This paper introduces LATO, a novel framework that enables scalable, flow matching-based synthesis of explicit 3D meshes with complex geometry and well-formed topology by representing them as vertex displacement fields within a structured, topology-preserving sparse voxel latent space, thereby eliminating the need for isosurface extraction or heuristic meshing.

Tianhao Zhao, Youjia Zhang, Hang Long, Jinshen Zhang, Wenbing Li, Yang Yang, Gongbo Zhang, Jozef Hladký, Matthias Nießner, Wei Yang2026-03-09💻 cs

Computer vision-based estimation of invertebrate biomass

This paper presents computer vision-based methods, utilizing a dual-camera system (BIODISCOVER) to capture sinking speed and area, that accurately estimate invertebrate dry mass with 10–20% median error, offering a scalable, non-destructive alternative to manual weighing for biodiversity monitoring.

Mikko Impiö, Philipp M. Rehsen, Jarrett Blair, Cecilie Mielec, Arne J. Beermann, Florian Leese, Toke T. Høye, Jenni Raitoharju2026-03-09💻 cs

OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis

This paper introduces OralGPT-Plus, an agentic vision-language model that leverages a new dataset (DentalProbe), a reinspection-driven reinforcement learning framework, and a holistic benchmark (MMOral-X) to enable iterative, symmetry-aware, and clinically reliable diagnostic reasoning for panoramic dental radiographs.

Yuxuan Fan, Jing Hao, Hong Chen, Jiahao Bao, Yihua Shao, Yuci Liang, Kuo Feng Hung, Hao Tang2026-03-09💻 cs

Rewis3d: Reconstruction Improves Weakly-Supervised Semantic Segmentation

Rewis3d is a novel framework that significantly improves weakly-supervised semantic segmentation on 2D images by leveraging feed-forward 3D reconstruction as an auxiliary supervisory signal to propagate sparse annotations across scenes via a dual student-teacher architecture, achieving state-of-the-art performance without additional labels or inference overhead.

Jonas Ernst, Wolfgang Boettcher, Lukas Hoyer, Jan Eric Lenssen, Bernt Schiele2026-03-09💻 cs

MoEMambaMIL: Structure-Aware Selective State Space Modeling for Whole-Slide Image Analysis

The paper proposes MoEMambaMIL, a structure-aware selective state space model that integrates region-nested selective scanning with mixture-of-experts mechanisms to effectively capture hierarchical spatial dependencies in whole-slide images, achieving state-of-the-art performance across nine downstream tasks.

Dongqing Xie, Yonghuang Wu2026-03-09💻 cs

CHMv2: Improvements in Global Canopy Height Mapping using DINOv3

The paper introduces CHMv2, a new global meter-resolution canopy height map that significantly improves accuracy and structural detail over existing products by leveraging a DINOv3-based depth estimation model trained on expanded, diverse airborne laser scanning data and validated against millions of satellite observations.

John Brandt, Seungeun Yi, Jamie Tolan, Xinyuan Li, Peter Potapov, Jessica Ertel, Justine Spore, Huy V. Vo, Michaël Ramamonjisoa, Patrick Labatut, Piotr Bojanowski, Camille Couprie2026-03-09💻 cs

Prompt Group-Aware Training for Robust Text-Guided Nuclei Segmentation

This paper introduces a prompt group-aware training framework that enhances the robustness and generalization of text-guided nuclei segmentation by enforcing consistency among semantically related prompts through quality-guided regularization and logit-level constraints, achieving significant performance gains without altering model architecture or inference.

Yonghuang Wu, Zhenyang Liang, Wenwen Zeng, Xuan Xie, Jinhua Yu2026-03-09🤖 cs.AI

REACT++: Efficient Cross-Attention for Real-Time Scene Graph Generation

REACT++ is a new state-of-the-art model for real-time Scene Graph Generation that leverages efficient feature extraction and subject-to-object cross-attention to simultaneously achieve the highest inference speed, improved relation prediction accuracy, and maintained object detection performance, outperforming its predecessor by being 20% faster with a 10% accuracy gain.

Maëlic Neau, Zoe Falomir2026-03-09💻 cs

Solving Jigsaw Puzzles in the Wild: Human-Guided Reconstruction of Cultural Heritage Fragments

This paper proposes a human-in-the-loop framework that combines an automatic relaxation-labeling solver with interactive guidance strategies to effectively and efficiently reassemble large-scale, fragmented cultural heritage artifacts in real-world conditions where traditional methods fail.

Omidreza Safaei, Sinem Aslan, Sebastiano Vascon, Luca Palmieri, Marina Khoroshiltseva, Marcello Pelillo2026-03-09💻 cs

← Previous Next →