Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
The paper introduces Sysformer, a novel approach that safeguards frozen large language models by learning to adapt system prompts in the embedding space to significantly improve safety robustness against harmful inputs and jailbreaking attacks without requiring costly model fine-tuning.