VII: Visual Instruction Injection for Jailbreaking Image-to-Video Generation Models
This paper introduces Visual Instruction Injection (VII), a training-free and transferable jailbreaking framework that exploits the visual instruction-following capabilities of Image-to-Video models by disguising malicious text prompts as benign visual cues in reference images, achieving high attack success rates across state-of-the-art commercial systems.