Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
This paper introduces DCR (Discernment via Contrastive Refinement), a novel alignment stage that effectively mitigates large language models' over-refusal tendency by enhancing their ability to distinguish between genuinely harmful and benign prompts, thereby improving helpfulness without compromising safety or general capabilities.