Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models
This paper introduces a Dynamic, Automatic, and Systematic (DAS) red-teaming framework that exposes a critical "Benchmarking Gap" in medical large language models, revealing that despite high static benchmark scores, most models exhibit profound brittleness, privacy leaks, bias, and hallucinations when subjected to continuous, adversarial stress-testing.
Jiazhen Pan (Cherise), Bailiang Jian (Cherise), Paul Hager (Cherise), Yundi Zhang (Cherise), Che Liu (Cherise), Friedrike Jungmann (Cherise), Hongwei Bran Li (Cherise), Chenyu You (Cherise), Junde Wu (Cherise), Jiayuan Zhu (Cherise), Fenglin Liu (Cherise), Yuyuan Liu (Cherise), Niklas Bubeck (Cherise), Christian Wachinger (Cherise), Chen (Cherise), Chen (Cherise), Zhenyu Gong, Cheng Ouyang, Georgios Kaissis, Benedikt Wiestler, Daniel RueckertTue, 10 Ma🤖 cs.LG