MedQ-Deg: A Multidimensional Benchmark for Evaluating MLLMs Across Medical Image Quality Degradations
This paper introduces MedQ-Deg, a comprehensive benchmark featuring 24,894 expert-calibrated question-answer pairs across 18 degradation types and 7 imaging modalities, which reveals that mainstream medical multimodal large language models suffer systematic performance drops and exhibit the "AI Dunning-Kruger Effect" of overconfidence under image quality degradations.
Jiyao Liu, Junzhi Ning, Chenglong Ma, Wanying Qu, Jianghan Shen, Siqi Luo, Jinjie Wei, Jin Ye, Pengze Li, Tianbin Li, Jiashi Lin, Hongming Shan, Xinzhe Luo, Xiaohong Liu, Lihao Liu, Junjun He, Ningsheng Xu2026-03-10💻 cs