IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding
This paper introduces IAG, the first input-aware backdoor attack on vision-language models for visual grounding, which utilizes a text-conditioned UNet to dynamically generate imperceptible, target-specific triggers that achieve high attack success rates across various models and datasets while maintaining stealth and robustness against defenses.