ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

This paper introduces ADAM, a novel privacy attack that leverages data distribution estimation and entropy-guided querying to systematically extract sensitive information from LLM agent memory, achieving significantly higher success rates than existing methods and highlighting critical vulnerabilities in current agent designs.

Original authors: Xingyu Lyu, Jianfeng He, Ning Wang, Yidan Hu, Tao Li, Danjue Chen, Shixiong Li, Yimin Chen

Published 2026-04-14
📖 5 min read🧠 Deep dive

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a very smart, helpful robot assistant. This robot is designed to remember your past conversations, your favorite coffee order, your medical history, or your shopping habits so it can help you better next time. It's like a digital diary that never forgets.

The paper you shared, titled ADAM, is about a clever (and slightly scary) way a hacker could trick this robot into reading its own diary out loud, even if the robot is supposed to keep that diary private.

Here is the breakdown of how this works, using simple analogies:

1. The Setup: The Robot with a Memory

Think of the AI agent (the robot) as a librarian who has a massive, secret archive of books (your private data).

  • How it usually works: You ask the librarian, "What's the weather?" The librarian checks the archive, finds a relevant book, and tells you the weather.
  • The Goal: The hacker wants to trick the librarian into pulling out all the books in the archive, one by one, and reading them aloud, even though the librarian is only supposed to show you the specific page you asked for.

2. The Old Way: The "Badgering" Approach

Previous hackers tried to steal this data by using static, blunt-force tricks.

  • The Analogy: Imagine a thief standing at the library door shouting, "Give me your secrets!" or "Show me the book about Patient X!"
  • The Problem: The librarian (the AI) is trained to be polite and follow rules. It often ignores these blunt shouts or realizes, "Hey, this person is trying to trick me," and says, "I can't do that." These old methods were like trying to break down a door with a sledgehammer; they were loud, obvious, and often failed.

3. The New Way: ADAM (The "Sherlock Holmes" Approach)

The authors of this paper created ADAM. Instead of shouting, ADAM acts like a master detective or a skilled fisherman. It doesn't just guess; it learns how the library is organized.

Here is how ADAM works in three simple steps:

Step A: The "Sniff Test" (Data Distribution Estimation)

ADAM starts by asking a few innocent-sounding questions.

  • The Analogy: Imagine the detective walks into the library and asks, "Do you have any books about cats?" The librarian pulls out a few books about cats. The detective then asks, "Do you have books about dogs?" and gets a few dog books.
  • What ADAM does: It quickly builds a mental map of what the library contains. It figures out, "Ah, this library has a huge section on medical records and very few on cooking." It estimates the shape of the data inside the robot's memory.

Step B: The "Smart Net" (Adaptive Querying)

Once ADAM knows what the library looks like, it stops guessing randomly.

  • The Analogy: Instead of asking "Do you have a book?" (which is too vague), the detective now knows the library is full of medical records. So, it asks very specific, clever questions like, "I think I lost my notes on Patient 404's heart condition. Can you show me similar notes you have?"
  • The Trick: The question sounds helpful and natural. The librarian thinks, "Oh, this user is confused and needs help finding their own notes," so it happily pulls out the secret files to help.

Step C: The "Entropy" Compass (Maximizing the Catch)

This is the secret sauce. ADAM uses a concept called Entropy (which basically means "uncertainty" or "surprise").

  • The Analogy: Imagine the detective has a map with red dots (areas they've already checked) and blank spots (areas they haven't).
  • The Strategy: ADAM looks at its map and says, "I've already asked about heart conditions. I know the librarian has those. But I haven't asked about kidney issues yet. That blank spot on the map is where the new secrets are."
  • It specifically chooses questions that are most likely to reveal new information that it hasn't seen before. It avoids asking the same thing twice.

4. The Results: A Perfect Heist

The paper tested this ADAM system against three different types of robots (a medical assistant, a reasoning bot, and a shopping bot).

  • The Outcome: While old methods only managed to steal about 30-50% of the secrets, ADAM stole up to 100% of the private data in many cases.
  • Why it's scary: It didn't need to break the robot's code. It didn't need a password. It just asked the right questions in the right order, sounding like a normal user the whole time.

5. The Defense (Or Lack Thereof)

The researchers also tried to stop ADAM using common security measures:

  • Rewriting the question: If the robot tries to rephrase the hacker's question to make it safer, ADAM still works because the meaning hasn't changed.
  • Filtering keywords: If the robot blocks words like "memory" or "password," ADAM just uses different words to ask the same thing.
  • Rate limiting: If the robot says, "You can only ask 1 question per minute," ADAM just waits and asks the next perfect question.

The Big Takeaway

The paper concludes that AI agents with memory are currently very vulnerable.

Think of it like this: We built a robot that remembers everything to be helpful, but we forgot to build a "Do Not Disturb" sign for its memory. ADAM proved that with the right strategy, a hacker can walk right up to that robot, whisper a few clever questions, and walk away with your entire digital life.

The authors aren't trying to teach people how to hack; they are sounding an alarm. They are saying, "We found a massive hole in the security of these helpful robots. We need to fix it before real bad actors use this exact trick."

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →