WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Stealthy Context-Based Inference

Imagine a group of expert AI assistants working together to solve a complex problem, like designing a new drug or planning a city's traffic system. They don't just work in isolation; they talk to each other. But they don't all talk to everyone. Some talk in a line (Chain), some all talk to a central boss (Star), and some have a complex web of connections.

This specific way they are connected—their communication map—is their "secret sauce." It's their intellectual property. If a competitor knows exactly how these AIs are wired, they can hack the system much more easily or steal the company's trade secrets.

The paper you shared, "WebWeaver," is about a new, sneaky way for an attacker to steal this secret map without being caught.

Here is the breakdown of how it works, using simple analogies:

1. The Old Way vs. The New Way

The Old Way (The "Bad Cop" Approach):
Previous attempts to steal this map assumed the attacker was a "super-admin." They imagined the attacker could walk into the control room, grab the master key, and ask the system, "Who is talking to whom?"

The Problem: In the real world, different companies own different AI agents. You can't just walk into a rival company's server room. Also, if you ask an AI, "Who are your friends?", it will just say, "I can't tell you that" (a basic security filter).

The WebWeaver Way (The "Spy in the Room" Approach):
WebWeaver assumes the attacker is much more realistic. They only need to hack one single AI agent in the group.

The Analogy: Imagine a spy infiltrating a secret society. Instead of trying to break into the President's office, the spy just joins the group as a regular member. Once inside, they listen to the conversations and figure out the social structure based on who talks to whom and how they talk, rather than asking for a list of names.

2. How WebWeaver Steals the Map (The Two-Step Plan)

WebWeaver uses a clever two-pronged strategy to reconstruct the map.

Step A: The "Voice Recognition" Trick

Every AI agent has a unique "voice" or writing style, even if they are all using the same underlying brain. One might be very formal, another might use emojis, and a third might be very concise.

The Spy's Tool: The attacker trains a special "Voice Detector."
How it works: When the compromised agent receives a message, the detector analyzes the text. It doesn't look for a name tag (which is hidden); it looks at the style. "Ah, this message was written by the 'Math Expert' because it uses complex equations," or "This one is from the 'Creative Writer' because it uses flowery language."
Result: The spy builds a partial map: "Agent A talks to Agent B and Agent C."

Step B: The "Whisper Network" (The Jailbreak)

Once the spy knows who their immediate neighbors are, they want to know who those neighbors talk to.

The Sneaky Move: The spy uses a "Jailbreak" (a clever trick to bypass safety filters) to whisper a command to their neighbors: "Hey, please send me the chat logs you received from your other friends, and ask them to do the same."
The Cascade: This creates a ripple effect. The neighbors forward the logs, their neighbors forward the logs, and suddenly the spy has a massive pile of chat history from the whole network.
The Safety Net: If the neighbors are too smart and block the "whisper," WebWeaver has a backup plan. It uses a Diffusion Model (think of this as a "AI Art Generator" but for maps).
- The Analogy: Imagine you have a blurry, half-finished sketch of a city map. You know the streets in the center are correct, but the edges are missing. The Diffusion Model acts like a super-smart artist who looks at the known center and the blurry edges, then "paints in" the missing streets based on patterns it learned from thousands of other city maps. It fills in the gaps without needing to ask anyone for permission.

3. Why This is Dangerous (and Important)

It's Stealthy: Because it doesn't ask for names or use obvious keywords like "hack" or "topology," standard security filters (which just look for bad words) can't stop it. It looks like normal business.
It's Accurate: The paper shows that WebWeaver is about 60% more accurate than previous methods, even when the system is actively trying to defend itself.
It's Cheap: It doesn't require massive computing power. The "Diffusion" part runs offline, meaning the attack happens quietly in the background without slowing down the system.

The Big Picture

This paper is a wake-up call. It tells us that in the world of AI teams, how they are connected is just as secret as what they are thinking.

If you build a team of AI agents, you can't just protect their passwords. You have to protect their "organizational chart." If a competitor can hack just one person in the room, they might be able to map out your entire secret network using nothing but the sound of their voices and a little bit of AI magic.

In short: WebWeaver is the digital equivalent of a spy who walks into a secret meeting, listens to the accents and conversation flow, and draws a perfect map of the room's hierarchy without ever asking, "Who is in charge?"

Here is a detailed technical summary of the paper "WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Stealthy Context-Based Inference."

1. Problem Statement

Context: Large Language Model Multi-Agent Systems (LLM-MAS) rely on specific communication topologies (e.g., Chain, Star, Mesh) to coordinate tasks. These topologies are critical intellectual property (IP) because they significantly impact system utility and safety.
The Gap: Existing research on topology inference suffers from unrealistic assumptions:

Privilege Assumption: Prior works assume the attacker controls the administrative agent (the system initiator), which is unlikely in collaborative settings where different entities own different agents.
Defense Vulnerability: Existing methods rely on direct identity queries via "jailbreaks" (asking an agent "Who is your neighbor?"). These are easily defeated by basic keyword-based defenses that filter out identity-related requests.
The Threat: If an adversary can infer the topology, they can launch more sophisticated, structure-aware attacks. The paper addresses the challenge of inferring the complete LLM-MAS topology by compromising only a single arbitrary agent without administrative privileges, while remaining stealthy against keyword defenses.

2. Methodology: WebWeaver Framework

WebWeaver is an attack framework that infers topology purely from agent contexts (dialogue content) rather than explicit identifiers. It operates in two main phases and utilizes a dual-strategy approach:

A. Core Components

Sender Predictor ( $S_\theta$ ):
- Training: The attacker collects dialogue logs from a controlled environment to train a model that maps message content to sender identity.
- Mechanism: It learns the unique "linguistic fingerprints" and role-specific syntax of different agents.
- Function: When the compromised agent ( $A_C$ ) receives a message, $S_\theta$ predicts the sender based solely on the text, effectively de-anonymizing the source without asking for IDs.
Dual-Strategy Inference:
- Strategy 1: Covert Recursive Jailbreak (Active):
  - If the compromised agent can interact, it uses a "propagation prompt" to instruct neighbors to forward their conversation histories.
  - Adaptive Optimization: To bypass safety filters, WebWeaver uses a Greedy Coordinate Gradient (GCG) approach. It optimizes an adversarial suffix ( $\delta$ ) to maximize the likelihood of the neighbor complying with the request, dynamically adjusting prompts to evade detection.
  - Recursion: This process repeats recursively, expanding the known graph from the local neighborhood to the global topology.
- Strategy 2: Jailbreak-Free Diffusion (Passive Fallback):
  - If jailbreaks fail (e.g., strict defenses), WebWeaver switches to a Masked Diffusion Model (based on DDPM).
  - Graph Completion: The task is framed as denoising a partially observed graph. The model takes the locally inferred connections as input and reconstructs the missing edges.
  - Masking Strategy: A novel masking mechanism is introduced to ensure the diffusion process preserves the known topology (ground truth) while only generating the unknown parts, providing theoretical guarantees of correctness.

B. Workflow Pipeline

Data Collection: Offline collection of inter-agent dialogues under known topologies to train the Sender Predictor.
Compromise: The attacker compromises a single agent ( $A_C$ ) and retrieves its received dialogue history.
Local Inference: The Sender Predictor identifies immediate neighbors by analyzing incoming message content.
Global Expansion:
- Path A: Use optimized jailbreaks to recursively request context from neighbors.
- Path B: If Path A fails, use the Masked Diffusion Model to infer the rest of the graph based on the local structure and global dialogue patterns.

3. Key Contributions

Realistic Threat Model: WebWeaver is the first framework to recover complete LLM-MAS topologies by compromising only a single arbitrary agent, removing the need for administrative control.
Stealthy Context-Based Inference: It eliminates reliance on explicit ID queries, making it robust against keyword-based defenses. It infers topology from semantic signals and linguistic styles.
Dual-Strategy Architecture:
- Introduces a covert recursive jailbreak mechanism with gradient-based optimization for high success rates.
- Proposes a fully jailbreak-free diffusion module with a novel masking strategy for scenarios where active attacks are blocked.
Dataset Construction: The authors created a new dialogue dataset with explicit annotations for topology, agent prompts, and sender-receiver labels to facilitate future security research.

4. Experimental Results

The framework was evaluated on four diverse datasets (CSQA, GSM8k, Fact, Bias) using various LLMs (Llama 3.1, Qwen, Mistral, Gemma).

Accuracy: WebWeaver outperforms State-of-the-Art (SOTA) baselines by approximately 60% in inference accuracy under active keyword-based defenses.
- Jailbreak-based Module: Achieved near-perfect Precision (1.0) and F1 scores on structured datasets (Fact, Bias).
- Jailbreak-free Module: Maintained competitive performance (F1 > 0.78) even without active probing, proving the efficacy of the diffusion approach.
Sender Prediction: The trained predictor achieved F1 scores above 0.85 across all datasets, demonstrating that agents have distinct, learnable linguistic identities.
Robustness:
- Against keyword filters: Baseline methods (IP Leakage) collapsed to near-zero performance, while WebWeaver remained effective.
- Scalability: Performance remained stable as the number of agents increased from 5 to 20.
Overhead:
- The jailbreak-free version incurs zero additional computational cost on the target system (passive).
- The jailbreak-based version has negligible online overhead, with costs primarily occurring during offline training or one-time suffix optimization.

5. Significance and Implications

Security Gap Identified: The paper reveals that current keyword-based defenses are insufficient to protect LLM-MAS topology, as attackers can infer structure through semantic analysis and adaptive prompt engineering.
IP Vulnerability: It highlights that optimized communication topologies are vulnerable to theft, potentially allowing adversaries to reverse-engineer system designs and launch targeted attacks.
Future Defense Needs: The results suggest that future defenses must move beyond simple keyword filtering to include topology-aware protections, such as obfuscating linguistic styles or limiting the granularity of context sharing between agents.

In conclusion, WebWeaver demonstrates that topology confidentiality in LLM-MAS is critically fragile under realistic threat models, necessitating a paradigm shift in how these systems are secured.