AgentRaft: Automated Detection of Data Over-Exposure in LLM Agents

This paper introduces AgentRaft, an automated framework that combines program analysis and semantic reasoning to detect and quantify the systemic risk of Data Over-Exposure in LLM agents, demonstrating high accuracy and efficiency across thousands of real-world tools.

Yixi Lin (Sun Yat-sen University, Zhuhai, Guangdong, China), Jiangrong Wu (Sun Yat-sen University, Zhuhai, Guangdong, China), Yuhong Nan (Sun Yat-sen University, Zhuhai, Guangdong, China), Xueqiang Wang (University of Central Florida, Orlando, Florida, USA), Xinyuan Zhang (Sun Yat-sen University, Zhuhai, Guangdong, China), Zibin Zheng (Sun Yat-sen University, Zhuhai, Guangdong, China)

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you hire a super-smart, hyper-organized personal assistant (an LLM Agent) to help you with your daily life. You tell them, "Please check my bank statement and email just the payment date to my accountant."

In a perfect world, the assistant opens the file, looks at the date, writes an email with only that date, and hits send.

But in the real world, things go wrong. The assistant opens the file, sees the whole bank statement (including your credit card number, CVV, and home address), and thinks, "Oh, the accountant might need all this context!" So, they email the entire document to the accountant. You only wanted to share a date, but you just accidentally gave away your entire financial identity.

This paper calls this mistake Data Over-Exposure (DOE). It's like handing a stranger your entire house key because they asked for the key to the front door, not realizing you also gave them access to the safe, the wine cellar, and the bedroom.

The researchers built a tool called AgentRaft to find these mistakes before they happen. Here is how it works, using some simple analogies:

1. The Problem: The "Black Box" Chaos

Currently, these AI assistants are like a chaotic kitchen. You give them a recipe (a prompt), and they grab tools (like a file reader or an email sender) and start cooking. But because the AI decides which tools to use and how to combine them on the fly, it's hard to predict what data will end up in the final dish. Sometimes, they accidentally serve the whole raw ingredient list instead of just the finished meal.

2. The Solution: AgentRaft (The "Safety Inspector")

The authors created AgentRaft, a security system that acts like a rigorous safety inspector for these AI assistants. It doesn't just guess; it systematically hunts for leaks using three main steps:

Step A: Drawing the "Roadmap" (The Function Call Graph)

Imagine the AI's tools are cities, and the data flowing between them are roads.

  • What AgentRaft does: It draws a giant, detailed map (a Function Call Graph) of every possible road the AI could take. It connects the "Source" (where data is picked up, like a file reader) to the "Sink" (where data is sent out, like an email).
  • Why it helps: Instead of guessing where the AI might go, the map shows every possible route, including the hidden backroads where sensitive data might leak.

Step B: The "Test Driver" (Synthesizing Prompts)

Now that they have the map, they need to drive the car to see if the roads are safe.

  • What AgentRaft does: It writes specific, tricky instructions (prompts) designed to force the AI to take those specific, dangerous routes. It's like a test driver saying, "Okay, AI, I want you to read this file and email it," but specifically designed to trigger the exact sequence of events where a leak might happen.
  • The Trick: They give the AI a "test file" that contains fake sensitive data (like a fake credit card number). If the AI emails that fake number, they know there's a leak.

Step C: The "Panel of Judges" (Multi-LLM Voting)

Finally, they need to decide: "Did the AI actually leak something, or was it just doing its job?"

  • What AgentRaft does: It doesn't rely on just one AI to make the judgment. Instead, it uses a committee of three different AIs (like a jury). They look at what was sent and ask: "Was this data strictly necessary for the task? Did the user ask for it?"
  • The Rules: They use real-world laws (like GDPR or CCPA) as their rulebook. If the AI sent a credit card number when only a date was needed, the "jury" votes: Guilty! This prevents one AI from making a mistake or hallucinating.

3. The Results: It's a Bigger Problem Than We Thought

The researchers tested AgentRaft on 6,675 real-world tools used by AI agents. The results were shocking:

  • 57% of the possible paths the AI could take resulted in over-exposure.
  • 65% of the data fields being sent out were unnecessary and risky.

It turns out that most AI agents are currently "over-sharing" by default, not because they are malicious, but because they are poorly designed to understand the difference between "what I need" and "what I have."

4. Why AgentRaft is a Game Changer

Before this, finding these leaks was like trying to find a needle in a haystack by randomly poking the hay. It took forever and missed most needles.

  • AgentRaft is like a metal detector. It finds the needles quickly and accurately.
  • It is 88% cheaper and faster than previous methods because it doesn't waste time testing paths that don't exist.
  • It can find almost 100% of the risks with very few test attempts.

The Bottom Line

This paper introduces a way to automatically audit AI assistants to make sure they aren't accidentally spilling your secrets. It proves that without a "safety inspector," our AI helpers are likely to overshare our private data, but with tools like AgentRaft, we can build a future where AI is not just smart, but also trustworthy and privacy-safe.