Agentic retrieval-augmented reasoning reshapes collective reliability under model variability in radiology question answering
This study demonstrates that agentic retrieval-augmented reasoning pipelines significantly enhance the collective reliability, consensus strength, and cross-model robustness of large language models in radiology question answering compared to zero-shot inference, while highlighting that accuracy and agreement alone are insufficient metrics for evaluating clinical safety under model variability.
Mina Farajiamiri, Jeta Sopa, Saba Afza, Lisa Adams, Felix Barajas Ordonez, Tri-Thien Nguyen, Mahshad Lotfinia, Sebastian Wind, Keno Bressem, Sven Nebelung, Daniel Truhn, Soroosh Tayebi Arasteh2026-03-09🤖 cs.AI