Information-Theoretic Privacy Control for Sequential Multi-Agent LLM Systems

Here is an explanation of the paper using simple language, everyday analogies, and creative metaphors.

The Big Picture: The "Pass-the-Parcel" Problem

Imagine you are sending a very sensitive letter (like your medical records or bank account details) across a country. Instead of one person carrying the whole letter, you break it down and send it through a chain of five different couriers.

Courier 1 reads the address and writes a summary on a sticky note.
Courier 2 takes that sticky note, adds their own notes, and passes it to Courier 3.
This continues until Courier 5 delivers the final package.

The Problem:
Even if every single courier is sworn to secrecy and follows strict rules about what they can say, a sneaky spy might still figure out your secret. How? By looking at the chain of sticky notes.

If Courier 1 writes "Patient has a heart condition," and Courier 2 writes "Prescribed aspirin," and Courier 3 writes "Sent to cardiology," the spy doesn't need to see the original letter. By piecing together the small hints passed down the line, they can reconstruct your entire secret.

This paper calls this "Compositional Privacy Leakage." It happens in AI systems where multiple "agents" (specialized AI models) work together in a line to solve a problem.

The Core Discovery: The "Whispering Game" Effect

The authors realized that in these AI chains, privacy leaks get worse the longer the chain gets.

They used a mathematical concept called Mutual Information (think of it as a "leakage meter") to prove a scary fact:

If Agent 1 leaks a tiny bit of info, Agent 2 might leak a little more.
But when Agent 3 gets that info, it doesn't just leak its tiny bit; it leaks the tiny bit from Agent 1 plus its own, amplified.
By the time the message reaches the end, a tiny, harmless leak at the start has turned into a massive data breach.

The Analogy: Imagine playing the "Telephone" game. If the first person whispers a secret, and the second person adds a little detail, and the third person adds more... by the time it reaches the last person, the story has changed completely. But in this AI scenario, the "story" is actually your private data, and the "changes" are actually amplifying the exposure of that data.

The paper proves that you cannot fix this by just telling each agent, "Don't leak your own secrets." You have to fix the whole system at once.

The Solution: The "Privacy Filter" Training

The authors propose a new way to train these AI teams. Instead of just teaching them how to do the job (like solving a math problem or diagnosing a disease), they teach them a second, invisible rule: "Don't let your private thoughts show in your notes."

How it works (The Metaphor):
Imagine the AI agents are students in a classroom working on a group project.

Old Way: The teacher tells each student, "Don't talk about your lunch money." But the students still write down hints about their lunch money in their shared notebook.
New Way (The Paper's Method): The teacher uses a special "Privacy Filter" during practice. Every time a student writes something in the notebook that hints at their lunch money, the teacher gives them a "penalty point."
- The students learn to write the answer clearly but scrub out the hints about their private lives.
- They learn to pass the "clean" notes to the next student so the final report has no trace of the lunch money.

Technically, they use a method called MINE (Mutual Information Neural Estimation). Think of this as a "leak detector" that constantly checks the AI's internal notes during training. If it sees a connection between the notes and the private data, it forces the AI to break that connection.

The Results: Safe Chains, Happy Users

The researchers tested this on three real-world scenarios:

Medical: Diagnosing diseases.
Finance: Analyzing bank reports.
Privacy Norms: Checking if an AI follows social rules about privacy.

What they found:

Without the fix: As they added more agents to the chain (from 2 to 5), the privacy leaks exploded. The AI became less safe the more complex the task got.
With the fix: The privacy leaks stayed low, even with long chains of agents.
The Trade-off: Usually, when you make something more private, it gets dumber. But here, the AI stayed very smart at its job (solving the math or diagnosing the patient) while becoming much better at keeping secrets.

The Takeaway

This paper teaches us a vital lesson for the future of AI: Privacy is a team sport.

In the past, we thought if every individual AI agent was "safe," the whole system was safe. This paper shows that's wrong. In a team of AI agents, the chain is only as strong as its weakest link, and the leaks get louder the longer the chain is.

To keep our data safe in the future, we need to train these AI teams not just to be smart, but to be systematically quiet about their secrets, ensuring that no matter how many hands pass the message, the secret stays hidden.

Here is a detailed technical summary of the paper "Information-Theoretic Privacy Control for Sequential Multi-Agent LLM Systems".

1. Problem Statement

The paper addresses a critical gap in the privacy security of sequential multi-agent Large Language Model (LLM) systems. While individual agents in a pipeline may satisfy local privacy constraints, the sequential composition of these agents creates a systemic vulnerability where sensitive information can be inferred through intermediate representations.

The Core Issue: In sequential pipelines (e.g., Agent A $\to$ Agent B $\to$ Agent C), Agent A processes sensitive local data ( $S_1$ ) and passes an intermediate representation ( $O_1$ ) to Agent B. Even if Agent A is "private" in isolation, statistical dependencies between $S_1$ and $O_1$ can be amplified and transformed by downstream agents.
The Failure of Local Guarantees: Traditional privacy methods (like Differential Privacy or local sanitization) treat agents in isolation. The authors argue that these local constraints are insufficient because they do not account for the cumulative leakage that occurs as information propagates through the pipeline.
Threat Model: An adversary observes the final system output ( $O_N$ ) and attempts to infer sensitive variables ( $S_1, \dots, S_N$ ) held by any agent in the chain, exploiting the statistical correlations induced by sequential processing.

2. Methodology

The authors propose a framework combining information-theoretic analysis with a privacy-regularized training strategy.

A. Theoretical Analysis: Compositional Leakage

The authors formalize privacy leakage using Mutual Information (MI).

System Model: A pipeline of $N$ agents where $O_i = A_i(O_{i-1}, D_i, S_i)$ .
Global Leakage Definition: $L_{global} = I(O_N; S_1, \dots, S_N)$ , representing the information about all sensitive variables inferable from the final output.
Theoretical Bound (Theorem 4.1): They derive a bound showing that even if each agent satisfies a local leakage constraint $I(O_i; S_i) \le \epsilon_i$ , the global leakage can amplify exponentially with pipeline depth:
$I(O_N; S_1, \dots, S_N) \le \sum_{i=1}^{N} 2^{N-i} \epsilon_i$
Key Insight: Leakage introduced by early agents dominates the global risk (scaling as $2^{N-i}$), meaning a small leak at the start of the pipeline is exponentially amplified by downstream agents.

B. Proposed Solution: Privacy-Regularized Training (MINE-Reg)

To mitigate this, they introduce a training framework that explicitly constrains information flow at every stage.

Objective Function: The total loss combines task utility and privacy regularization:
$L_{total} = L_{utility} + \sum_{i=1}^{N} \beta_i \hat{I}_{MINE}(O_i; S_i)$
Where $\beta_i$ controls the privacy-utility trade-off for agent $i$ .
Mutual Information Estimation: Since calculating MI for high-dimensional LLM representations is intractable, they use Mutual Information Neural Estimation (MINE) based on the Donsker-Varadhan representation.
- A neural critic network ( $T_{\psi}$ ) estimates the MI between the agent's output ( $O_i$ ) and its sensitive variable ( $S_i$ ).
- During training, the agent parameters are updated to minimize this estimated MI, while the critic is updated to maximize it (adversarial estimation).
Mechanism: This creates an information bottleneck at each agent boundary, forcing agents to learn representations that are sufficient for the task but contain minimal information about their local sensitive context.

3. Key Contributions

Formalization of Compositional Leakage: Proved that local privacy constraints do not compose; sequential execution leads to exponential leakage amplification, particularly from early-stage agents.
Theoretical Bound: Derived a mathematical bound characterizing how local leakage scales with pipeline depth under Markov assumptions.
MINE-Reg Framework: Proposed a novel training objective that directly penalizes Mutual Information between intermediate representations and sensitive variables using neural estimators.
System-Level Perspective: Demonstrated that privacy in agentic systems must be treated as a global property requiring system-level control, not just local agent constraints.

4. Experimental Results

The framework was evaluated on three benchmarks: MedQA (medical reasoning), FinQA (financial reasoning), and PrivacyLens (contextual privacy norms), using models like LLaMA-3B/7B and Qwen-2B/4B with pipelines of 2 to 5 agents.

Leakage Suppression:
- MINE-Reg reduced average Mutual Information ( $MI_{avg}$ ) by 75–90% compared to unregularized baselines.
- Crucially, it flattened the leakage amplification curve. While baseline leakage increased monotonically with agent depth, MINE-Reg kept leakage bounded even in deep pipelines (5 agents).
Privacy-Utility Trade-off:
- Sensitive Blocked (SB): The ability to prevent adversaries from inferring sensitive data improved significantly (e.g., from ~0.38 to ~0.82 on MedQA).
- Benign Succeeded (BS): Task utility (accuracy on non-sensitive tasks) remained stable, with only a modest decrease (typically 6–10 points), proving the method does not catastrophically degrade performance.
- PARI (Privacy-Aware Reasoning Index): A composite metric showed substantial gains, indicating a superior operating point where both privacy and utility are preserved.
Validation of Theory: Experiments confirmed the theoretical prediction that early agents contribute most to global leakage. Regularizing only early agents reduced leakage significantly, but full regularization was required for optimal control.

5. Significance and Impact

Paradigm Shift: The paper challenges the prevailing assumption that securing individual components secures the whole system. It establishes that sequential composition is a primary vector for privacy failure in agentic AI.
Practical Deployment: As industries (healthcare, finance) move toward multi-agent LLM architectures, this work provides a necessary toolkit for ensuring that intermediate reasoning steps do not inadvertently leak proprietary or sensitive data.
Scalability: The use of MINE allows this privacy control to scale to large, deep pipelines without requiring explicit access to the raw sensitive data during the inference of downstream agents, making it suitable for distributed or modular agent systems.

In conclusion, the paper demonstrates that privacy in sequential multi-agent LLMs is a system-level property that requires explicit, information-theoretic regularization during training to prevent the exponential amplification of privacy risks.

Information-Theoretic Privacy Control for Sequential Multi-Agent LLM Systems

The Big Picture: The "Pass-the-Parcel" Problem

The Core Discovery: The "Whispering Game" Effect

The Solution: The "Privacy Filter" Training

The Results: Safe Chains, Happy Users

The Takeaway

1. Problem Statement

2. Methodology

A. Theoretical Analysis: Compositional Leakage

B. Proposed Solution: Privacy-Regularized Training (MINE-Reg)

3. Key Contributions

4. Experimental Results

5. Significance and Impact

More like this

Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning

Missingness Bias Calibration in Feature Attribution Explanations

Why Is RLHF Alignment Shallow? A Gradient Analysis

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent Planning