cs.CV papers | Gist.Science

Toward Real-world Infrared Image Super-Resolution: A Unified Autoregressive Framework and Benchmark Dataset

This paper introduces Real-IISR, a unified autoregressive framework equipped with thermal-structural guidance and adaptive quantization to address real-world infrared image super-resolution, accompanied by the FLIR-IISR benchmark dataset for rigorous evaluation.

Yang Zou, Jun Ma, Zhidong Jiao + 3 more2026-03-06💻 cs

Evaluating GPT-5 as a Multimodal Clinical Reasoner: A Landscape Commentary

This landscape commentary evaluates the GPT-5 family against GPT-4o, revealing substantial improvements in expert-level textual reasoning and multimodal synthesis that approach state-of-the-art performance in tasks like mammography, while highlighting that generalist models still lag behind specialized systems in perception-critical domains such as neuroradiology.

Alexandru Florea, Shansong Wang, Mingzhe Hu + 5 more2026-03-06💻 cs

Evaluating and Correcting Human Annotation Bias in Dynamic Micro-Expression Recognition

This paper introduces the Global Anti-Monotonic Differential Selection Strategy (GAMDSS), a novel architecture that mitigates human annotation bias in cross-cultural micro-expression recognition by dynamically re-selecting keyframes to construct robust spatio-temporal representations, thereby improving model performance and standardizing annotation practices without increasing computational parameters.

Feng Liu, Bingyu Nan, Xuezhong Qian + 1 more2026-03-06💻 cs

DSA-SRGS: Super-Resolution Gaussian Splatting for Dynamic Sparse-View DSA Reconstruction

This paper proposes DSA-SRGS, the first super-resolution Gaussian splatting framework for dynamic sparse-view DSA reconstruction, which integrates a Multi-Fidelity Texture Learning Module with confidence-aware supervision and Radiative Sub-Pixel Densification to recover fine-grained vascular details while avoiding blurring and hallucination artifacts.

Shiyu Zhang, Zhicong Wu, Huangxuan Zhao + 7 more2026-03-06💻 cs

MADCrowner: Margin Aware Dental Crown Design with Template Deformation and Refinement

The paper proposes MADCrowner, a margin-aware framework that combines a template deformation network (CrownDeformR) with a novel margin segmentation network (CrownSegger) to automatically generate high-precision, clinically feasible dental crowns by addressing limitations in spatial resolution and surface overextension found in existing learning-based methods.

Linda Wei, Chang Liu, Wenran Zhang + 9 more2026-03-06💻 cs

Privacy-Aware Camera 2.0 Technical Report

This paper proposes a novel privacy-preserving perception framework that utilizes an AI Flow-based edge-cloud architecture to transform raw images into mathematically irreconstructible abstract feature vectors at the source, thereby enabling secure behavior recognition and semantic reconstruction via dynamic contours while completely eliminating visual data leakage in sensitive environments.

Huan Song, Shuyu Tian, Ting Long + 5 more2026-03-06💻 cs

RMK RetinaNet: Rotated Multi-Kernel RetinaNet for Robust Oriented Object Detection in Remote Sensing Imagery

The paper proposes RMK RetinaNet, a rotated object detection framework for remote sensing imagery that addresses limitations in receptive field adaptation, multi-scale feature fusion, and angle regression discontinuity through a Multi-Scale Kernel Block, Multi-Directional Contextual Anchor Attention, a Bottom-up Path, and an Euler Angle Encoding Module, achieving state-of-the-art performance on benchmark datasets.

Huiran Sun2026-03-06💻 cs

LAW & ORDER: Adaptive Spatial Weighting for Medical Diffusion and Segmentation

This paper introduces "LAW & ORDER," a dual-adapter framework that employs Learnable Adaptive Weighting to stabilize diffusion-based medical image synthesis and Optimal Region Detection to enhance efficient segmentation, collectively addressing spatial imbalance to significantly improve generative quality and segmentation accuracy while maintaining a lightweight model architecture.

Anugunj Naman, Ayushman Singh, Gaibo Zhang + 1 more2026-03-06💻 cs

Comparative Evaluation of Traditional Methods and Deep Learning for Brain Glioma Imaging. Review Paper

This review paper evaluates traditional and deep learning methods for brain glioma segmentation and classification, concluding that convolutional neural network architectures outperform traditional techniques in post-MRI analysis to enhance treatment planning and patient outcomes.

Kiranmayee Janardhan, Vinay Martin DSa Prabhu, T. Christy Bobby2026-03-06💻 cs

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

This paper introduces MASQuant, a novel post-training quantization framework for Multimodal Large Language Models that overcomes smoothing misalignment and cross-modal computational invariance challenges through modality-aware smoothing and cross-modal compensation, achieving state-of-the-art performance across dual- and tri-modal architectures.

Lulu Hu, Wenhu Xiao, Xin Chen + 4 more2026-03-06💻 cs

Guiding Diffusion-based Reconstruction with Contrastive Signals for Balanced Visual Representation

This paper proposes Diffusion Contrastive Reconstruction (DCR), a method that injects contrastive signals derived from reconstructed images into the diffusion process to resolve gradient conflicts and jointly optimize both discriminative and detail-perceptive abilities, thereby overcoming the limitations of CLIP's visual encoder for balanced visual representation.

Boyu Han, Qianqian Xu, Shilong Bao + 4 more2026-03-06💻 cs

Meta-D: Metadata-Aware Architectures for Brain Tumor Analysis and Missing-Modality Segmentation

The paper presents Meta-D, a metadata-aware architecture that leverages categorical scanner information to dynamically modulate feature extraction for improved 2D brain tumor detection and to serve as a robust anchor for cross-attention mechanisms in 3D missing-modality segmentation, achieving significant performance gains and parameter reduction.

SangHyuk Kim, Daniel Haehn, Sumientra Rampersad2026-03-06💻 cs

Revisiting Shape from Polarization in the Era of Vision Foundation Models

This paper demonstrates that by addressing domain gaps through a high-quality dataset of 3D-scanned objects, DINOv3 priors, and sensor-aware augmentation, a lightweight polarization-based model trained on a small dataset can significantly outperform both state-of-the-art Shape from Polarization methods and large-scale RGB-only Vision Foundation Models in single-shot surface normal estimation.

Chenhao Li, Taishi Ono, Takeshi Uemori + 1 more2026-03-06💻 cs

Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning

This paper proposes the Class-specific Augmentation based Disentanglement (CAD) framework to mitigate instance entanglement in instance-dependent partial label learning by employing intra-class feature alignment and inter-class weighted penalty mechanisms to clarify class boundaries and reduce confusion.

Rui Zhao, Bin Shi, Kai Sun + 1 more2026-03-06🤖 cs.LG

Towards Highly Transferable Vision-Language Attack via Semantic-Augmented Dynamic Contrastive Interaction

This paper proposes Semantic-Augmented Dynamic Contrastive Attack (SADCA), a novel method that enhances the transferability of adversarial attacks on vision-language models by employing progressive dynamic contrastive interactions to disrupt cross-modal alignment and a semantic augmentation module to increase example diversity.

Yuanbo Li, Tianyang Xu, Cong Hu + 3 more2026-03-06💻 cs

Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models

This paper proposes MPCAttack, a novel framework that enhances the transferability of adversarial attacks against Multi-Modal Large Language Models by leveraging a Multi-Paradigm Collaborative Optimisation strategy to jointly aggregate and balance visual and textual semantic representations for more effective global perturbation.

Yuanbo Li, Tianyang Xu, Cong Hu + 3 more2026-03-06💻 cs

GloSplat: Joint Pose-Appearance Optimization for Faster and More Accurate 3D Reconstruction

GloSplat is a novel 3D reconstruction framework that achieves faster and more accurate results by performing joint pose-appearance optimization during 3D Gaussian Splatting training, uniquely preserving explicit SfM feature tracks as separate optimizable parameters to prevent pose drift and enable fine-grained refinement.

Tianyu Xiong, Rui Li, Linjie Li + 1 more2026-03-06💻 cs

On Multi-Step Theorem Prediction via Non-Parametric Structural Priors

This paper introduces a training-free, non-parametric approach to multi-step theorem prediction that overcomes the scalability limitations of vanilla in-context learning by leveraging Theorem Precedence Graphs to encode temporal dependencies and impose topological constraints, achieving state-of-the-art accuracy on the FormalGeo7k benchmark without gradient-based optimization.

Junbo Zhao, Ting Zhang, Can Li + 3 more2026-03-06🤖 cs.AI

Scalable Injury-Risk Screening in Baseball Pitching From Broadcast Video

This paper presents a scalable monocular video pipeline that recovers clinically relevant biomechanical metrics from broadcast baseball footage with high accuracy, enabling effective injury-risk screening for thousands of pitchers without the need for expensive stadium-based motion capture systems.

Jerrin Bright, Justin Mende, John Zelek2026-03-06💻 cs

SURE: Semi-dense Uncertainty-REfined Feature Matching

The paper proposes SURE, a semi-dense feature matching framework that improves reliability in challenging scenarios by jointly predicting correspondences and their confidence through a novel evidential head that models both aleatoric and epistemic uncertainties.

Sicheng Li, Zaiwang Gu, Jie Zhang + 3 more2026-03-06💻 cs

← Previous Next →