AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations
The paper proposes AttriGuard, a novel runtime defense that mitigates Indirect Prompt Injection in LLM agents by employing parallel counterfactual tests to causally attribute tool invocations to user intent rather than untrusted external observations, thereby achieving near-perfect attack success rate reduction with minimal utility loss.