Measuring and Eliminating Refusals in Military Large Language Models
This paper introduces a novel gold-standard dataset developed by US military veterans to quantify excessive safety refusals in military Large Language Models, demonstrating that while specialized fine-tuning can significantly reduce these refusals, achieving zero refusals and maximum accuracy requires deeper, end-to-end specialization.