cs.CV papers | Gist.Science

Act Like a Pathologist: Tissue-Aware Whole Slide Image Reasoning

This paper introduces HistoSelect, a question-guided, coarse-to-fine retrieval framework that mimics pathologists' human-like scanning behavior to efficiently identify relevant tissue regions and informative patches in gigapixel whole slide images, thereby significantly reducing computational costs while improving accuracy and interpretability in pathology visual question answering.

Wentao Huang, Weimin Lyu, Peiliang Lou + 8 more2026-03-03💻 cs

Direct low-field MRI super-resolution using undersampled k-space

This paper proposes a novel k-space dual channel U-Net framework that directly reconstructs high-quality, high-field-like MRI images from undersampled low-field k-space data, outperforming traditional spatial-domain methods and achieving quality comparable to full k-space acquisitions.

Daniel Tweneboah Anyimadu, Mohammed M. Abdelsamea, Ahmed Karam Eldaly2026-03-03💻 cs

Specializing Foundation Models via Mixture of Low-Rank Experts for Comprehensive Head CT Analysis

This paper introduces the Mixture of Low-Rank Experts (MoLRE) framework, a parameter-efficient fine-tuning method that significantly enhances the performance of diverse foundation models on comprehensive multi-label head CT diagnosis by employing specialized low-rank adapters and unsupervised soft routing without requiring explicit pathology supervision.

Youngjin Yoo, Han Liu, Bogdan Georgescu + 14 more2026-03-03💻 cs

CoLC: Communication-Efficient Collaborative Perception with LiDAR Completion

The paper proposes CoLC, a communication-efficient collaborative perception framework that leverages LiDAR completion techniques—specifically Foreground-Aware Point Sampling, Completion-Enhanced Early Fusion, and Dense-Guided Dual Alignment—to restore scene completeness from sparse transmissions and achieve superior perception-communication trade-offs while remaining robust to model heterogeneity.

Yushan Han, Hui Zhang, Qiming Xia + 2 more2026-03-03💻 cs

SCOUT: Fast Spectral CT Imaging in Ultra LOw-data Regimes via PseUdo-label GeneraTion

SCOUT is a fast, self-supervised spectral CT reconstruction method that leverages spatial nonlocal similarity and projection domain conjugate properties to generate pseudo-3D data, enabling high-fidelity imaging with detail recovery and artifact mitigation under ultra-low data regimes without requiring external datasets or pre-training.

Guoquan Wei, Liu Shi, Shaoyu Wang + 3 more2026-03-03💻 cs

STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification

This paper proposes STMI, a novel multi-modal object Re-Identification framework that integrates segmentation-guided feature modulation, semantic token reallocation, and cross-modal hypergraph interaction to enhance foreground representation, preserve discriminative cues, and capture high-order semantic relationships while mitigating background noise.

Xingguo Xu, Zhanyu Liu, Weixiang Zhou + 5 more2026-03-03💻 cs

TokenSplat: Token-aligned 3D Gaussian Splatting for Feed-forward Pose-free Reconstruction

TokenSplat is a feed-forward framework that achieves joint 3D Gaussian reconstruction and camera pose estimation from unposed multi-view images by introducing a token-aligned prediction module and an asymmetric dual-flow decoder to enable robust, iterative-free 3D scene modeling.

Yihui Li, Chengxin Lv, Zichen Tang + 2 more2026-03-03💻 cs

Towards Universal Khmer Text Recognition

This paper proposes a Universal Khmer Text Recognition (UKTR) framework featuring a novel modality-aware adaptive feature selection (MAFS) technique to overcome data scarcity and modality-specific limitations, achieving state-of-the-art performance while introducing the first comprehensive benchmark for the task.

Marry Kong, Rina Buoy, Sovisal Chenda + 3 more2026-03-03💻 cs

Towards Khmer Scene Document Layout Detection

This paper addresses the scarcity of annotated data for Khmer scene document layout analysis by introducing a comprehensive framework that includes a new benchmark dataset, an open-source augmentation tool for synthetic data generation, and YOLO-based models with oriented bounding boxes to handle the script's structural complexities and geometric distortions.

Marry Kong, Rina Buoy, Sovisal Chenda + 3 more2026-03-03💻 cs

IU: Imperceptible Universal Backdoor Attack

This paper introduces IU, a novel imperceptible universal backdoor attack that leverages graph convolutional networks to generate stealthy, class-specific perturbations, achieving high attack success rates with minimal poisoning while evading existing defenses.

Hsin Lin, Yan-Lun Chen, Ren-Hung Hwang + 1 more2026-03-03🤖 cs.LG

A Reconstruction System for Industrial Pipeline Inner Walls Using Panoramic Image Stitching with Endoscopic Imaging

This paper presents an industrial pipeline inner wall reconstruction system that utilizes panoramic image stitching and polar coordinate transformation on endoscopic video to generate comprehensive planar panoramic images, thereby significantly improving the efficiency and accuracy of defect detection compared to traditional frame-by-frame review methods.

Rui Ma, Yifeng Wang, Ziteng Yang + 1 more2026-03-03💻 cs

Diversity over Uniformity: Rethinking Representation in Generated Image Detection

This paper proposes an anti-feature-collapse learning framework that preserves diverse and complementary forgery cues in the representation space to overcome the generalization limitations of existing methods, significantly improving cross-model detection accuracy and robustness against unseen generative mechanisms.

Qinghui He, Haifeng Zhang, Qiao Qin + 3 more2026-03-03💻 cs

UniHM: Unified Dexterous Hand Manipulation with Vision Language Model

UniHM introduces a unified framework for dexterous hand manipulation that leverages a shared tokenizer for diverse hand morphologies and a vision-language action model trained on human-object interactions to generate physically feasible, human-like manipulation sequences from open-vocabulary language instructions without requiring extensive real-world teleoperation data.

Zhenhao Zhang, Jiaxin Liu, Ye Shi + 1 more2026-03-03💻 cs

Stroke outcome and evolution prediction from CT brain using a spatiotemporal diffusion autoencoder

This paper presents a self-supervised spatiotemporal diffusion autoencoder that leverages longitudinal CT images and time-from-onset data to achieve state-of-the-art prediction of stroke severity and functional outcomes with minimal labeled data.

Adam Marcus, Paul Bentley, Daniel Rueckert2026-03-03🤖 cs.AI

Analyzing and Improving Fast Sampling of Text-to-Image Diffusion Models

This paper bridges the gap in training-free sampling acceleration for text-to-image diffusion models by systematically analyzing their design space and proposing the Constant Total Rotation Schedule (TORS), a geometrically inspired strategy that achieves high-quality image generation in just 10 sampling steps across various models.

Zhenyu Zhou, Defang Chen, Siwei Lyu + 2 more2026-03-03💻 cs

DUCX: Decomposing Unfairness in Tool-Using Chest X-ray Agents

This paper introduces DUCX, a systematic audit framework that decomposes demographic bias in tool-using chest X-ray agents into tool exposure, transition, and reasoning components, revealing that significant subgroup disparities exist in intermediate agentic behaviors that are often overlooked by end-to-end performance evaluations.

Zikang Xu, Ruinan Jin, Xiaoxiao Li2026-03-03💻 cs

Neural Functional Alignment Space: Brain-Referenced Representation of Artificial Neural Networks

This paper introduces the Neural Functional Alignment Space (NFAS), a brain-referenced framework that characterizes diverse artificial neural networks by modeling their layer-wise dynamics via Dynamic Mode Decomposition and projecting them into a biologically anchored coordinate system to reveal structured modality-specific clustering and cross-modal convergence.

Ruiyu Yan, Hanqi Jiang, Yi Pan + 4 more2026-03-03💻 cs

Efficient Conformal Volumetry for Template-Based Segmentation

This paper introduces ConVOLT, a conformal prediction framework that enhances uncertainty quantification for template-based medical image segmentation by calibrating volumetric intervals based on deformation field properties, thereby achieving target coverage with significantly tighter bounds than existing output-space methods.

Matt Y. Cheung, Ashok Veeraraghavan, Guha Balakrishnan2026-03-03🧬 q-bio

NERFIFY: A Multi-Agent Framework for Turning NeRF Papers into Code

NERFIFY is a multi-agent framework that automates the conversion of Neural Radiance Field research papers into runnable Nerfstudio plugins by leveraging domain-specific constraints, dependency-aware synthesis, and visual feedback to achieve expert-level code quality in minutes rather than weeks.

Seemandhar Jain, Keshav Gupta, Kunal Gupta + 1 more2026-03-03💻 cs

COMBAT: Conditional World Models for Behavioral Agent Training

The paper introduces COMBAT, a real-time diffusion-based world model trained on Tekken 3 that leverages causal distillation and diffusion forcing to generate sophisticated, reactive opponent behaviors from single-player data without requiring explicit policy supervision.

Anmol Agarwal, Pranay Meshram, Sumer Singh + 5 more2026-03-03💻 cs

← Previous Next →