PRBench: End-to-end Paper Reproduction in Physics Research
The paper introduces PRBench, a rigorous benchmark comprising 30 expert-curated physics tasks for evaluating the end-to-end reproduction capabilities of AI agents, revealing that current models struggle significantly with code correctness, data accuracy, and achieving successful reproduction despite their advanced reasoning abilities.
Shi Qiu, Junyi Deng, Yiwei Deng, Haoran Dong, Jieyu Fu, Mao Li, Zeyu Li, Zhaolong Zhang, Huiwen Zheng, Leidong Bao, Anqi Lv, Zihan Mo, Yadi Niu, Yiyang Peng, Yu Tian, Yili Wang, Ziyu Wang, Zi-Yu Wang (…)2026-03-31⚛️ hep-lat