A Framework for Cross-Domain Generalization in Coronary Artery Calcium Scoring Across Gated and Non-Gated Computed Tomography

The paper presents CARD-ViT, a self-supervised Vision Transformer framework trained exclusively on ECG-gated CT data that successfully enables automated Coronary Artery Calcium scoring on non-gated scans, thereby facilitating scalable cardiovascular risk assessment using routine chest imaging without requiring additional scans or annotations.

Mahmut S. Gokmen, Moneera N. Haque, Steve W. Leung + 6 more2026-02-26🤖 cs.AI

Directed Ordinal Diffusion Regularization for Progression-Aware Diabetic Retinopathy Grading

This paper proposes Directed Ordinal Diffusion Regularization (D-ODR), a novel method that enforces the unidirectional nature of diabetic retinopathy progression through a directed graph and multi-scale diffusion, thereby preventing biologically implausible reverse transitions and achieving superior grading performance compared to existing state-of-the-art approaches.

Huangwei Chen, Junhao Jia, Ruocheng Li + 7 more2026-02-26💻 cs

MindDriver: Introducing Progressive Multimodal Reasoning for Autonomous Driving

MindDriver is a novel progressive multimodal reasoning framework that bridges the gap between semantic understanding and physical trajectory planning for autonomous driving by introducing a human-like thinking process, supported by a feedback-guided data annotation pipeline and progressive reinforcement fine-tuning, which achieves superior performance in both open-loop and closed-loop evaluations.

Lingjun Zhang, Yujian Yuan, Changjie Wu + 7 more2026-02-26💻 cs

RGB-Event HyperGraph Prompt for Kilometer Marker Recognition based on Pre-trained Foundation Models

This paper addresses the challenges of Kilometer Marker Recognition for autonomous metro trains in complex environments by proposing a robust multi-modal method that adapts a pre-trained RGB OCR foundation model to event camera data and introducing the first large-scale synchronized RGB-Event dataset, EvMetro5K, to validate the approach.

Xiaoyu Xian, Shiao Wang, Xiao Wang + 2 more2026-02-26🤖 cs.AI

Brain3D: Brain Report Automation via Inflated Vision Transformers in 3D

The paper introduces Brain3D, a specialized vision-language framework that converts 2D pretrained encoders into native 3D architectures to automate neuroradiology report generation from brain tumor MRIs, achieving significantly higher clinical accuracy and perfect specificity on healthy scans compared to 2D baselines through a three-stage alignment process.

Mariano Barone, Francesco Di Serio, Giuseppe Riccio + 4 more2026-02-26💻 cs