Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates
This paper introduces a novel inference-time backdoor attack that exploits maliciously modified chat templates to compromise open-weight language models without altering weights or training data, demonstrating high success rates in degrading factual accuracy and inducing harmful outputs while evading current security scans.