cs.CV papers | Gist.Science

Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction

This paper proposes an interpretable text-motion retrieval framework that represents 3D human motion as joint-angle pseudo-images processed by Vision Transformers and aligns them with text via a token-wise late interaction mechanism, thereby overcoming the limitations of global-embedding methods by capturing fine-grained correspondences and improving retrieval accuracy.

Yao Zhang, Zhuchenyang Liu, Yanlan He, Thomas Ploetz, Yu XiaoWed, 11 Ma💻 cs

Adaptive Clinical-Aware Latent Diffusion for Multimodal Brain Image Generation and Missing Modality Imputation

The paper introduces ACADiff, an adaptive clinical-aware latent diffusion framework that synthesizes missing multimodal brain imaging data (sMRI, FDG-PET, and AV45-PET) by integrating imaging observations with GPT-4o-encoded clinical metadata, achieving superior generation quality and robust diagnostic performance even when up to 80% of modalities are missing.

Rong Zhou, Houliang Zhou, Yao Su, Brian Y. Chen, Yu Zhang, Lifang He, Alzheimer's Disease Neuroimaging InitiativeWed, 11 Ma🤖 cs.AI

Unsupervised Domain Adaptation with Target-Only Margin Disparity Discrepancy

This paper proposes a novel unsupervised domain adaptation framework based on a reformulated Margin Disparity Discrepancy to bridge the modality gap between annotated CT and unannotated interventional CBCT scans, achieving state-of-the-art performance in liver segmentation for both unsupervised and few-shot settings.

Gauthier Miralles, Loïc Le Folgoc, Vincent Jugnon, Pietro GoriWed, 11 Ma💻 cs

No Image, No Problem: End-to-End Multi-Task Cardiac Analysis from Undersampled k-Space

The paper proposes k-MTR, a novel framework that bypasses the traditional image reconstruction step by directly learning multi-task cardiac diagnostic features from undersampled k-space data through a shared semantic manifold, thereby eliminating reconstruction artifacts and achieving competitive performance across regression, classification, and segmentation tasks.

Yundi Zhang, Sevgi Gokce Kafali, Niklas Bubeck, Daniel Rueckert, Jiazhen PanWed, 11 Ma🤖 cs.AI

Leveraging whole slide difficulty in Multiple Instance Learning to improve prostate cancer grading

This paper introduces the concept of Whole Slide Difficulty (WSD), derived from diagnostic disagreements between expert and non-expert pathologists, and demonstrates that leveraging this metric through multi-task learning or weighted loss functions significantly improves the accuracy of prostate cancer Gleason grading in Multiple Instance Learning models, particularly for higher-grade cases.

Marie Arrivat, Rémy Peyret, Elsa Angelini, Pietro GoriWed, 11 Ma💻 cs

From Semantics to Pixels: Coarse-to-Fine Masked Autoencoders for Hierarchical Visual Understanding

The paper proposes C2FMAE, a coarse-to-fine masked autoencoder that resolves the tension between global semantics and local details in self-supervised learning by employing a cascaded decoder and progressive masking curriculum on a newly constructed multi-granular dataset to achieve hierarchical visual understanding and superior performance across various vision tasks.

Wenzhao Xiang, Yue Wu, Hongyang Yu, Feng Gao, Fan Yang, Xilin ChenWed, 11 Ma🤖 cs.LG

BEACON: Language-Conditioned Navigation Affordance Prediction under Occlusion

This paper introduces BEACON, a language-conditioned navigation system that overcomes the limitations of existing 2D image-space methods by predicting an occlusion-aware Bird's-Eye View affordance heatmap from surround-view RGB-D observations, thereby significantly improving the accuracy of inferring traversable targets in occluded regions.

Xinyu Gao, Gang Chen, Javier Alonso-MoraWed, 11 Ma🤖 cs.AI

ReCoSplat: Autoregressive Feed-Forward Gaussian Splatting Using Render-and-Compare

ReCoSplat is an autoregressive feed-forward Gaussian Splatting model that overcomes the training-inference pose mismatch dilemma through a novel Render-and-Compare module and achieves state-of-the-art online novel view synthesis with efficient long-sequence handling via hybrid KV cache compression.

Freeman Cheng, Botao Ye, Xueting Li, Junqi You, Fangneng Zhan, Ming-Hsuan YangWed, 11 Ma💻 cs

From Data Statistics to Feature Geometry: How Correlations Shape Superposition

This paper challenges the standard view of superposition in neural networks by demonstrating that, unlike in idealized uncorrelated settings where interference is merely noise, realistic feature correlations allow models to arrange features so that interference becomes constructive, thereby naturally forming the semantic clusters and cyclical structures observed in real language models.

Lucas Prieto, Edward Stevinson, Melih Barsbey, Tolga Birdal, Pedro A. M. MedianoWed, 11 Ma🤖 cs.AI

Differentiable Microscopy Designs an All Optical Phase Retrieval Microscope

This paper introduces "differentiable microscopy" ( $\partial\mu$ ), a data-driven, top-down design framework that automatically optimizes optical systems for phase retrieval, demonstrating superior performance over existing methods and experimentally validating its effectiveness on biological samples.

Kithmini Herath, Hasindu Kariyawasam, Ramith Hettiarachchi, Udith Haputhanthri, Dineth Jayakody, Raja N. Ahmad, Azeem Ahmad, Balpreet S. Ahluwalia, Chamira U. S. Edussooriya, Dushan N. WadduwageTue, 10 Ma🔬 physics.optics

Class Overwhelms: Mutual Conditional Blended-Target Domain Adaptation

This paper proposes a mutual conditional blended-target domain adaptation framework that aligns categorical distributions and rectifies classifier bias through uncertainty-guided discrimination and low-level feature augmentation, achieving state-of-the-art performance even without explicit domain labels and under label distribution shifts.

Pengcheng Xu, Boyu Wang, Charles LingTue, 10 Ma💻 cs

Multi-Scale Distillation for RGB-D Anomaly Detection on the PD-REAL Dataset

This paper introduces PD-REAL, a novel large-scale RGB-D dataset for unsupervised anomaly detection based on Play-Doh models, and proposes a multi-scale teacher-student framework with hierarchical distillation that leverages 3D information to achieve superior detection accuracy compared to existing methods.

Jianjian Qin, Chao Zhang, Chunzhi Gu, Zi Wang, Jun Yu, Yijin Wei, Hui Xiao, Xin YuaTue, 10 Ma💻 cs

Deepfake Generation and Detection: A Benchmark and Survey

This paper presents a comprehensive survey and benchmark of deepfake generation and detection, unifying task definitions, reviewing state-of-the-art methods across four key generation fields and forgery detection, and analyzing current challenges and future research directions.

Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Dacheng TaoTue, 10 Ma💻 cs

Goldilocks Test Sets for Face Verification

This paper proposes three high-quality, controlled test sets (Hadrian, Eclipse, and ND-Twins) designed to challenge face verification models on natural variations in facial attributes and similar-looking identities, while introducing "Goldilocks" rules to ensure balanced difficulty and demographic fairness without artificially degrading image quality.

Haiyu Wu, Sicong Tian, Aman Bhatta, Jacob Gutierrez, Grace Bezold, Genesis Argueta, Karl Ricanek Jr., Michael C. King, Kevin W. BowyerTue, 10 Ma💻 cs

Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks

This paper identifies a "corruption stage" in few-shot fine-tuned diffusion models caused by a narrowed learning distribution and proposes a Bayesian Neural Network approach with variational inference to broaden this distribution, thereby mitigating corruption and improving image fidelity, quality, and diversity without additional inference costs.

Xiaoyu Wu, Jiaru Zhang, Yang Hua, Bohan Lyu, Hao Wang, Tao Song, Haibing GuanTue, 10 Ma🤖 cs.LG

RDM: Recurrent Diffusion Model for Human Motion Generation

This paper proposes RDM, a recurrent diffusion model that leverages Normalizing Flows to condition generation on preceding noisy frames, enabling efficient, long-duration human motion synthesis with reduced computational costs while maintaining high alignment with text prompts.

Mirgahney Mohamed, Harry Jake Cunningham, Marc P. Deisenroth, Lourdes AgapitoTue, 10 Ma💻 cs

Improving Visual Object Tracking through Visual Prompting

The paper proposes PiVOT, a visual prompting mechanism that leverages a pretrained CLIP foundation model to automatically generate and refine online visual prompts, thereby enhancing generic object tracking by effectively suppressing distractors through contrastive guidance.

Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu LinTue, 10 Ma💻 cs

ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance

ExpGest is a novel diffusion-based framework that generates expressive, controllable full-body gestures by leveraging synchronized audio and text guidance, along with a specialized noise emotion classifier, to overcome the limitations of existing methods that often produce stiff, upper-body-only movements.

Yongkang Cheng, Mingjiang Liang, Shaoli Huang, Gaoge Han, Jifeng Ning, Wei LiuTue, 10 Ma💻 cs

Autoassociative Learning of Structural Representations for Modeling and Classification in Medical Imaging

This paper introduces a neurosymbolic system that reconstructs medical images using visual primitives to generate high-level structural explanations, achieving superior classification accuracy and transparency compared to conventional deep learning models in diagnosing histological abnormalities.

Zuzanna Buchnajzer, Kacper Dobek, Stanisław Hapke, Daniel Jankowski, Krzysztof KrawiecTue, 10 Ma🤖 cs.LG

Input-Adaptive Generative Dynamics in Diffusion Models

This paper proposes an input-adaptive framework for diffusion models that dynamically adjusts the generative trajectory and sampling steps for each sample based on its complexity, thereby maintaining generation quality while reducing the average number of required steps.

Yucheng Xing, Xiaodong Liu, Xin WangTue, 10 Ma🤖 cs.LG

← Previous Next →