When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models
This paper introduces UPA-RFAS, a unified framework that generates universal and transferable physical adversarial patches to effectively attack diverse Vision-Language-Action (VLA) models across unknown architectures, finetuned variants, and sim-to-real shifts by leveraging robust feature alignment, a two-phase min-max optimization, and VLA-specific attention and semantic losses.