Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems
This paper demonstrates that existing alignment-based defenses against control-flow hijacking in multi-agent systems are vulnerable to evasion due to inherent safety-functionality conflicts and limited context visibility, and proposes ControlValve, a new defense mechanism that enforces control-flow integrity and least privilege through permitted control-flow graphs and contextual rules.