Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors
This paper challenges the assumption that neutralizing known triggers eliminates backdoors by demonstrating that perceptually distinct "alternative triggers" can reliably activate latent backdoor directions in feature space, thereby advocating for defenses that target these underlying representation patterns rather than specific input triggers.