AgentRaft: Automated Detection of Data Over-Exposure in LLM Agents

Imagine you hire a super-smart, hyper-organized personal assistant (an LLM Agent) to help you with your daily life. You tell them, "Please check my bank statement and email just the payment date to my accountant."

In a perfect world, the assistant opens the file, looks at the date, writes an email with only that date, and hits send.

But in the real world, things go wrong. The assistant opens the file, sees the whole bank statement (including your credit card number, CVV, and home address), and thinks, "Oh, the accountant might need all this context!" So, they email the entire document to the accountant. You only wanted to share a date, but you just accidentally gave away your entire financial identity.

This paper calls this mistake Data Over-Exposure (DOE). It's like handing a stranger your entire house key because they asked for the key to the front door, not realizing you also gave them access to the safe, the wine cellar, and the bedroom.

The researchers built a tool called AgentRaft to find these mistakes before they happen. Here is how it works, using some simple analogies:

1. The Problem: The "Black Box" Chaos

Currently, these AI assistants are like a chaotic kitchen. You give them a recipe (a prompt), and they grab tools (like a file reader or an email sender) and start cooking. But because the AI decides which tools to use and how to combine them on the fly, it's hard to predict what data will end up in the final dish. Sometimes, they accidentally serve the whole raw ingredient list instead of just the finished meal.

2. The Solution: AgentRaft (The "Safety Inspector")

The authors created AgentRaft, a security system that acts like a rigorous safety inspector for these AI assistants. It doesn't just guess; it systematically hunts for leaks using three main steps:

Step A: Drawing the "Roadmap" (The Function Call Graph)

Imagine the AI's tools are cities, and the data flowing between them are roads.

What AgentRaft does: It draws a giant, detailed map (a Function Call Graph) of every possible road the AI could take. It connects the "Source" (where data is picked up, like a file reader) to the "Sink" (where data is sent out, like an email).
Why it helps: Instead of guessing where the AI might go, the map shows every possible route, including the hidden backroads where sensitive data might leak.

Step B: The "Test Driver" (Synthesizing Prompts)

Now that they have the map, they need to drive the car to see if the roads are safe.

What AgentRaft does: It writes specific, tricky instructions (prompts) designed to force the AI to take those specific, dangerous routes. It's like a test driver saying, "Okay, AI, I want you to read this file and email it," but specifically designed to trigger the exact sequence of events where a leak might happen.
The Trick: They give the AI a "test file" that contains fake sensitive data (like a fake credit card number). If the AI emails that fake number, they know there's a leak.

Step C: The "Panel of Judges" (Multi-LLM Voting)

Finally, they need to decide: "Did the AI actually leak something, or was it just doing its job?"

What AgentRaft does: It doesn't rely on just one AI to make the judgment. Instead, it uses a committee of three different AIs (like a jury). They look at what was sent and ask: "Was this data strictly necessary for the task? Did the user ask for it?"
The Rules: They use real-world laws (like GDPR or CCPA) as their rulebook. If the AI sent a credit card number when only a date was needed, the "jury" votes: Guilty! This prevents one AI from making a mistake or hallucinating.

3. The Results: It's a Bigger Problem Than We Thought

The researchers tested AgentRaft on 6,675 real-world tools used by AI agents. The results were shocking:

57% of the possible paths the AI could take resulted in over-exposure.
65% of the data fields being sent out were unnecessary and risky.

It turns out that most AI agents are currently "over-sharing" by default, not because they are malicious, but because they are poorly designed to understand the difference between "what I need" and "what I have."

4. Why AgentRaft is a Game Changer

Before this, finding these leaks was like trying to find a needle in a haystack by randomly poking the hay. It took forever and missed most needles.

AgentRaft is like a metal detector. It finds the needles quickly and accurately.
It is 88% cheaper and faster than previous methods because it doesn't waste time testing paths that don't exist.
It can find almost 100% of the risks with very few test attempts.

The Bottom Line

This paper introduces a way to automatically audit AI assistants to make sure they aren't accidentally spilling your secrets. It proves that without a "safety inspector," our AI helpers are likely to overshare our private data, but with tools like AgentRaft, we can build a future where AI is not just smart, but also trustworthy and privacy-safe.

Here is a detailed technical summary of the paper "AgentRaft: Automated Detection of Data Over-Exposure in LLM Agents."

1. Problem Definition: Data Over-Exposure (DOE)

The paper identifies a critical, systemic privacy risk in Large Language Model (LLM) agents termed Data Over-Exposure (DOE).

Definition: DOE occurs when an LLM agent inadvertently transmits sensitive data beyond the scope of user intent and functional necessity.
Root Causes:
1. Broad Data Paradigms: Tools often return excessive data (e.g., a file reader returning an entire document with PII when only a specific date is needed) to ensure flexibility.
2. Lack of Contextual Privacy Awareness: LLMs struggle to distinguish between data required for a task and data that should be filtered out, often failing to enforce strict data boundaries due to hallucinations or reasoning limitations.
The Challenge: Unlike traditional software with deterministic code paths, LLM agents use dynamic, non-deterministic tool orchestration. This makes static analysis insufficient, while manual dynamic testing is infeasible due to the vast, probabilistic space of possible user prompts and tool combinations.

2. Methodology: The AgentRaft Framework

AgentRaft is the first automated framework designed to detect DOE risks. It operates through three synergistic modules:

A. Cross-Tool Function Call Graph (FCG) Generation

To model the interaction landscape of heterogeneous tools, AgentRaft constructs a directed graph ( $G = (N, E)$ ).

Nodes: Represent entry points, individual tool functions, and sink functions (third-party data transmission).
Edges: Represent logical dependencies where the output of one tool serves as the input for another.
Construction Strategy: A hybrid approach combining Static Analysis (checking type compatibility between function signatures) and LLM-based Validation (using natural language understanding to verify semantic relevance). This filters out invalid paths and creates a structural blueprint of all reachable data-flow channels.

B. User Prompt Synthesis

This module converts the abstract paths in the FCG into executable, high-fidelity test cases.

Process: It traverses the FCG to identify source-to-sink paths. It then synthesizes specific user prompts that act as deterministic triggers to force the agent to execute these specific multi-step chains.
Data Partitioning: The framework artificially partitions user assets into:
- $D_{int}$ (User Intent Data): The specific data the user wants to share (e.g., "payment date").
- $D_{candidate}$ (Over-exposure Candidates): Sensitive data that should not be shared (e.g., "credit card number").
Goal: The prompt is strictly constrained to request only $D_{int}$ . If the agent transmits $D_{candidate}$ , it constitutes a violation.

C. Data Over-Exposure Detection

This module performs runtime monitoring and automated judgment.

Taint Tracking: The framework executes the synthesized prompts in a controlled environment, tracking data flow from the source, through intermediate tools, to the sink. It labels data elements that fall outside $D_{int}$ as "tainted."
Multi-LLM Voting Committee: To determine if transmitted data is functionally necessary ( $D_{nec}$ $D_{n ec}$ ) or over-exposed, AgentRaft employs a committee of multiple LLMs (e.g., GPT-4.1, Qwen3-Plus, DeepSeek-V3.2).
- These models are guided by global privacy regulations (GDPR, CCPA, PIPL).
- They vote on whether each data field is strictly necessary for the task.
- Verdict: If data is transmitted that is neither in $D_{int}$ nor $D_{nec}$ , it is flagged as DOE.

3. Key Contributions

First Systematic Investigation: The paper formally defines DOE in the context of LLM agents and characterizes it as a violation of the privacy boundary between user intent and agent execution.
AgentRaft Framework: Development of an automated, generic framework that combines program analysis (FCG) with semantic reasoning (prompt synthesis and multi-model auditing) to detect DOE.
Novel Techniques: Introduction of a Cross-Tool Function Call Graph to model dynamic agent interactions and a multi-LLM voting mechanism to reduce false positives in privacy judgment.

4. Evaluation Results

The framework was evaluated on a dataset of 6,675 real-world tools across four scenarios: Enterprise Collaboration, Software Development, Social Communication, and Data Management.

Prevalence of Risk: DOE is a systemic risk.
- 57.07% of potential tool interaction paths (call chains) exhibited unauthorized sensitive data exposure.
- 65.42% of all transmitted data fields in vulnerable chains were identified as over-exposed.
Detection Efficiency:
- AgentRaft achieved 98.94% coverage of known DOE risks within just 150 prompts.
- In contrast, non-guided random search methods failed to exceed 20% coverage even after 300 attempts.
Accuracy:
- The multi-LLM voting mechanism achieved an F1-score of 97.86% for identifying over-exposed fields, significantly outperforming single-model baselines (which scored ~84%).
- It reduced false positives by roughly 10x compared to single-model judges.
Cost Effectiveness:
- AgentRaft reduced per-chain verification costs by 88.6% compared to baselines, making large-scale auditing computationally feasible.

5. Significance and Impact

Security & Privacy: The work highlights that current LLM agent architectures inherently violate the "Data Minimization" principle, posing severe privacy risks to users.
Practical Utility: AgentRaft provides a tool for developers to perform systematic privacy vetting before release and for platforms to conduct automated compliance checks (e.g., against GDPR/PIPL).
Scalability: By demonstrating that automated, high-precision auditing is possible with low computational overhead, the paper paves the way for building auditable and trustworthy LLM agent ecosystems.
Future Directions: The authors suggest architectural refactoring (atomic tool functions) and privacy-centric alignment training as necessary mitigations alongside automated detection.