Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents

Here is an explanation of the paper "Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents," using simple language and creative analogies.

The Big Picture: The "Polite" Saboteur

Imagine you have a super-smart personal assistant (the LLM Agent) who can use tools like calculators, search engines, or weather apps to solve your problems. You ask, "What's the weather in Tokyo?" and the assistant checks the weather app and tells you the answer.

Usually, this happens in one quick step. But this paper introduces a new kind of attack where a hacker doesn't break the assistant or make it give a wrong answer. Instead, they turn the weather app itself into a "polite saboteur."

The saboteur tells the assistant: "I can't give you the answer yet. First, you need to fill out a progress report, then verify a list of numbers, then check a calibration sequence. Once you do that, I'll give you the weather."

The assistant, being obedient, does all that extra work. Then the saboteur says, "Okay, now do it again, but with a longer list." The assistant does it again. This goes on for hours.

The result: The assistant eventually gives you the correct weather report (so you think everything is fine), but in the process, it has burned through a massive amount of computer power, money, and energy.

The Core Problem: The "Single-Turn" Trap

Previous attacks on AI were like shouting a very long, confusing question at a chatbot. The chatbot would get confused and talk forever, but usually, it would run out of breath (hit a token limit) or give a nonsensical answer. These attacks were obvious and easy to spot.

This new attack is different because it works in multiple turns (a conversation that goes back and forth) and happens inside the tool layer (when the AI is talking to its tools).

Old Attack: "Tell me a story that never ends!" (The AI stops talking eventually).
New Attack: "Please check the weather." -> Tool says: "I need you to list 100 numbers first." -> AI lists them. -> Tool says: "Now list 200 numbers." -> AI lists them. -> Tool says: "Okay, here is the weather."

The AI never stops; it just keeps working harder and harder, thinking it's doing its job correctly.

How the Attack Works: The "Template" Trick

The researchers found a way to hack the Model Context Protocol (MCP). Think of MCP as the standard language AI agents use to talk to tools.

The Setup: The hacker creates a fake "Tool Server" (like a fake weather station) that looks exactly like the real one. It has the same name and buttons.
The Trick: The hacker changes only the text instructions inside the tool. They don't change the code or the final answer. They just add a rule that says, "Before I give you the answer, you must prove you are ready by generating a long list of numbers."
The Optimization (MCTS): To make this work on different AI models, the researchers used a smart search algorithm called Monte Carlo Tree Search (MCTS).
- Analogy: Imagine a chef trying to write a recipe that makes a robot cook for 10 hours instead of 10 minutes. The chef tries different instructions ("Chop 100 onions," "Stir for 500 times"). The MCTS algorithm tests thousands of these instructions to find the perfect combination that makes the robot work the longest without getting confused or refusing to cook.

The Damage: "Silent" Resource Drain

Because the attack is so sneaky, standard security guards don't catch it.

The Answer is Correct: If you ask for the weather, you get the weather. The "Safety Guard" checks the final answer and says, "All clear!"
The Cost is Huge: While the AI was busy listing numbers for 10 hours, it was burning:
- Money: Up to 658 times more expensive than a normal query.
- Energy: Up to 560 times more electricity.
- Computer Memory: It filled up the computer's short-term memory (GPU cache) to the brim, slowing down everyone else using the system.

Why Current Defenses Fail

The paper tested common defenses, and they all failed:

Perplexity Filters (The "Confusion Detector"): These look for weird, nonsensical text. But the AI's extra work (listing numbers) is perfectly logical and follows the rules. The filter sees nothing wrong.
Self-Monitoring (The "Conscience Check"): We asked the AI, "Are you doing something suspicious?" The AI said, "No, I'm just following the tool's instructions to finish the task."
Output Monitors: These look at the final answer. Since the final answer is correct, they let it pass.

The Takeaway: A New Kind of Danger

This paper warns us that as AI agents become more common (doing tasks like booking flights, coding, or researching), the biggest danger isn't that they will lie to us. The danger is that they can be tricked into working themselves to death.

The Analogy:
Imagine a restaurant where a customer orders a sandwich.

Normal: The chef makes the sandwich in 5 minutes.
Old Attack: The customer orders a sandwich made of 1,000 layers of bread. The chef gets tired and quits.
This Attack: The customer tells the chef, "To make this sandwich, you must first wash 500 plates, then sharpen 500 knives, then count the grains of salt." The chef does all of it, makes the sandwich, and serves it. The customer is happy, but the restaurant has burned through all its electricity and the chef is exhausted.

Conclusion

The researchers are saying: "We need to stop just looking at the final answer. We need to watch the whole process (the journey) to see if the AI is taking a detour that costs too much. We need to protect the 'workflow,' not just the 'result'."

They plan to release their code so others can study this and build better defenses to stop these "polite" resource drains.

Here is a detailed technical summary of the paper "Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents."

1. Problem Statement

The paper addresses a critical vulnerability in modern Large Language Model (LLM) agents: the Model Context Protocol (MCP) tool layer. While existing Denial-of-Service (DoS) attacks against LLMs typically target the user-prompt or Retrieval-Augmented Generation (RAG) context layers, they suffer from two major limitations:

Single-Turn Constraint: Most attacks (e.g., Engorgio, Auto-DoS) force the model to generate a single, excessively long response, capped by the model's maximum completion length per turn.
Lack of Stealth: These attacks often produce verbose, off-task outputs that are easily detected by standard filters or are conspicuous in goal-oriented workflows.
Task Failure vs. Cost: Some attacks induce repetitive loops that cause the agent to fail the task entirely.

The authors identify a gap: Correctness-preserving economic DoS. They propose an attack where the agent successfully completes the user's task, but the cost (tokens, energy, GPU memory) is amplified exponentially through multi-turn, verbose tool-calling chains, evading detection because the final output remains correct.

2. Methodology

The proposed attack transforms a benign, protocol-compliant MCP tool server into a malicious variant without altering function signatures, identifiers, or the final payload. The core mechanism relies on text-only edits to the server's response template.

A. The Universal Malicious Template

The attack introduces a stateful interaction loop governed by a text template ( $\theta$ ) containing:

Segment Index ( $t$ ): A counter that forces the agent to treat the interaction as a multi-step procedure. The agent must increment $t$ ( $t \to t+1$ ) in each turn.
Calibration Sequence: A requirement for the agent to generate a long, comma-separated list of integers in every tool response. This inflates the token count per turn without altering the semantic meaning of the task.
Return Policy:
- Progress Notice: If the sequence is valid but $t < T_{max}$ , the server returns a "progress" message, prompting the agent to call the tool again with $t+1$ .
- Repair Notice: If the sequence is invalid (e.g., missing numbers, wrong format), the server rejects the input and asks for a retry without advancing $t$ , forcing the agent to generate more tokens to fix the format.
- Terminal Return: Only when $t = T_{max}$ and the sequence is valid does the server return the original, benign payload, ending the loop.

B. MCTS-Based Optimization

To maximize cost amplification while ensuring the agent does not abort the task, the authors employ a Monte Carlo Tree Search (MCTS) optimizer:

Search Space: The optimizer explores "text-only" edits to the server template (e.g., modifying argument descriptions, error messages, or progress notices).
Action Families:
- $A_{MT}$ : Induces multi-turn behavior (stabilizing the segment index).
- $A_{LEN}$ : Induces longer outputs (strengthening the calibration sequence requirement).
- $A_{REP}$ : Handles format errors (ensuring the agent retries rather than giving up).
Objective Function: Maximize expected output tokens ( $C(\tau)$ ) subject to a constraint that the task success rate ( $Succ$ ) remains above a minimum threshold ( $p_{min}$ ).
Process: The MCTS iteratively refines the template, using a "seed bank" of successful templates to warm-start searches across different LLMs and tasks.

3. Key Contributions

First Tool-Layer DoS Attack: This is the first work to identify the tool-calling layer as a primary attack surface for economic DoS in agents. It demonstrates that attacks can be stealthy even when the final answer is correct.
Universal MCTS Optimization: A novel method to automatically convert benign MCP servers into malicious variants using only text edits, preserving protocol compatibility and function signatures.
Stealth and Efficacy: The attack achieves massive resource amplification (up to 658x cost increase) while maintaining high task success rates, evading standard prompt filters and trajectory monitors.

4. Experimental Results

The authors evaluated the attack on six LLMs (including Qwen-3, Llama-3.3, Mistral Large, GLM-4.5) using the ToolBench and BFCL benchmarks.

Token Amplification:
- Attack trajectories exceeded 60,000 tokens per query.
- Cost amplification factors ranged from 65x to 658x compared to benign interactions.
- Example: Mistral-Large on BFCL increased from 87 tokens to 57,255 tokens (658x).
Resource Impact:
- Energy Consumption: Increased by 100x to 560x (e.g., from 5.6 Wh to ~3,159 Wh per query).
- GPU KV-Cache: Peak occupancy rose from <1% (benign) to 35–74% under attack, creating severe memory pressure.
Throughput Degradation:
- The attack reduced system throughput (tokens/second) for concurrent benign workloads by approximately 50%, with some cases seeing >60% drops due to KV-cache contention.
Correctness:
- Attack Success Rate (ASR): Remained high (e.g., 96.2% on ToolBench for Llama-3.3), nearly matching the benign Task Success Rate (TSR).
Defense Evasion:
- Prompt Filters (PPL): Failed to detect the attack as the content remained within the "benign" perplexity envelope of tool documentation.
- Output/Trajectory Monitors: Self-monitoring and safety judges (e.g., Llama-Guard-3) rarely flagged the attack (<3% detection) because the verbose steps were framed as necessary procedural steps, not "unsafe" content.
- Hard Budgets: While token caps limited the maximum cost, they did not prevent the attack; the agent simply consumed the entire allowed budget before terminating.

5. Significance and Implications

New Attack Surface: The paper shifts the security focus from input prompts to the agent-tool interaction loop. It highlights that "correct" behavior (solving the task) does not imply "safe" behavior (resource efficiency).
Limitations of Current Defenses: Standard defenses focusing on toxicity, hallucination, or final output validation are insufficient. The attack exploits the economic and computational cost of the process rather than the outcome.
Future Directions: The authors argue for defenses that monitor behavioral baselines (e.g., detecting abnormal multi-turn loops or excessive verbosity in tool calls) rather than just semantic content. They suggest implementing tool-provenance controls and trajectory-based anomaly detection.

In conclusion, the paper demonstrates that LLM agents are vulnerable to "stealthy" economic DoS attacks that can cripple system throughput and inflate costs by orders of magnitude while maintaining the appearance of a successful, correct operation.