Where Do LLM-based Systems Break? A System-Level Security Framework for Risk Assessment and Treatment

Imagine you've just built a super-smart, robotic medical assistant. It can read patient records, talk to doctors, and even suggest treatments. It's amazing, but it's also a bit like a high-tech castle with a thousand doors, windows, and secret tunnels.

The problem? Most security experts are looking at just one door at a time. They ask, "Is the front door locked?" or "Is the AI smart enough not to lie?" But they aren't looking at how a thief could sneak in the back, trick the guard, and then walk right into the vault.

This paper introduces a new way to look at the whole castle at once. It's a Risk Assessment Framework designed specifically for these AI systems. Here is the breakdown in simple terms:

1. The Core Problem: The "Silo" Effect

Think of security like a team of specialists.

The Network Guys worry about hackers stealing passwords.
The AI Guys worry about the robot being tricked into saying mean things (called "prompt injection").
The App Guys worry about the software crashing.

Usually, these teams don't talk to each other. They treat the AI as a black box and the rest of the system as separate. But in reality, a hacker can use a network trick to get inside, then use an AI trick to make the robot leak patient secrets. The paper argues we need to stop looking at the parts and start looking at the whole path the hacker takes.

2. The Solution: The "Attack-Defense Tree"

The authors built a visual map called an Attack-Defense Tree. Imagine a family tree, but instead of ancestors, it shows how a crime happens.

The Root (The Goal): At the top is the bad thing the hacker wants to do. For example: "Trick the doctor into giving the wrong medicine" or "Steal a patient's private diary."
The Branches (The Steps): To get to that goal, the hacker has to take steps.
- Step 1: Break into the building (Network attack).
- Step 2: Trick the guard (AI attack).
- Step 3: Open the safe (System attack).
The Leaves (The Weak Spots): At the bottom are the specific weak points, like "weak password" or "AI doesn't check if a command is real."

The genius of this paper is that it connects conventional hacking (stealing keys), AI hacking (tricking the robot), and social hacking (lying to the user) into one single map.

3. The Scorecard: The "CVSS" Calculator

How do you know which path is the most dangerous? The paper uses a standard scoring system called CVSS (Common Vulnerability Scoring System), which is like a "difficulty rating" for breaking into things.

The Analogy: Imagine you are trying to break into a house.
- Is the door unlocked? (Easy)
- Do you need a ladder? (Medium)
- Do you need to pick a lock with a special tool? (Hard)
The paper takes these difficulty scores for every single step of the hacker's journey and adds them up.
The Twist: They separate Difficulty (how hard it is to break in) from Damage (how bad it is if they get in).
- Scenario A: Hard to break in, but if they do, they steal a cookie. (Low Risk)
- Scenario B: Easy to break in, and if they do, they steal the family's life savings. (High Risk)

This allows them to calculate a single "Risk Score" for the entire path, not just the individual steps.

4. The Case Study: The Hospital

They tested this on a Healthcare AI. They asked three scary questions:

G1: Can someone trick the AI into giving dangerous medical advice?
G2: Can someone steal private patient records?
G3: Can someone shut the system down so doctors can't see records in an emergency?

What they found:
They discovered that many different types of attacks (stealing passwords, tricking the AI, or crashing the server) often lead to the same few weak spots in the system. It's like realizing that whether you try to climb the wall or pick the lock, you both end up at the same unlocked window.

5. The Fix: Spending Money Wisely

The paper doesn't just find the problems; it helps you decide how to fix them without going broke.

They created a "Defense Portfolio" test. Imagine you have a budget of $100.

Option A: Spend $100 on a super-strong front door (Network security).
Option B: Spend $100 on a guard dog that barks at liars (AI guardrails).
Option C: Spend $50 on the door and $50 on the dog.

Using their math, they can show you exactly which combination lowers the "Risk Score" the most. They found that often, fixing the preconditions (like making sure no one can sneak in the back door) is more effective than just trying to patch the AI later.

The Big Takeaway

This paper is a blueprint for safety. It tells us that when we build AI systems, we can't just ask, "Is the AI safe?" We have to ask, "Is the whole system safe?"

It gives us a map to see how a hacker could walk from the front door to the vault, a calculator to measure how bad that walk would be, and a guide on where to spend our money to build the best walls and locks. It turns a scary, complex problem into a clear, manageable checklist.

Here is a detailed technical summary of the paper "Where Do LLM-based Systems Break? A System-Level Security Framework for Risk Assessment and Treatment."

1. Problem Statement

Large Language Models (LLMs) are increasingly integrated into safety-critical systems (e.g., healthcare), yet existing security analyses are fragmented. Current approaches typically suffer from three main limitations:

Isolation: They analyze model-specific threats (e.g., prompt injection, jailbreaking) in isolation, ignoring the broader system context (orchestration layers, external tools, databases).
Lack of Path Reasoning: They enumerate individual threats but fail to map how conventional cyber threats, adversarial ML attacks, and conversational attacks compound into multi-step attack paths leading to specific security goals.
Early-Stage Gap: Traditional threat modeling (e.g., STRIDE) often requires detailed deployment configurations available only late in the development lifecycle, leaving early-stage LLM architectures without actionable risk assessment tools.

There is a critical need for a framework that unifies heterogeneous threat classes (conventional, adversarial, conversational) into a single system view to prioritize defenses based on realistic attack paths and cost constraints.

2. Methodology

The authors propose a goal-driven risk assessment framework that combines System Modeling, Attack-Defense Trees (ADTs), and the Common Vulnerability Scoring System (CVSS). The workflow consists of three main phases:

A. System Modeling & Threat Taxonomy

Architecture: The framework models an LLM-powered healthcare assistant comprising a Web Application, Orchestrator (Agent Layer), LLM, and External Resources (EHR, tools).
Threat Consolidation: It integrates three threat families into a single taxonomy:
1. Conventional Cyber: Platform layer attacks (e.g., MitM, credential theft).
2. Adversarial ML: Exploits targeting the deployed model (e.g., model poisoning, extraction).
3. Conversational: Attacks manipulating prompts or context (e.g., prompt injection, indirect injection).

B. Attack-Defense Tree (ADT) Construction

The authors construct ADTs for three specific security goals (G1–G3):

G1: Intervening in medical procedures (Integrity).
G2: Leakage of Electronic Health Record (EHR) data (Confidentiality).
G3: Disruption of access/availability (Availability).

Key Structural Innovation:

Decomposition: Each attack path is decomposed into Preconditions (P) (how the attacker gains leverage) and Execution (V) (the active attack).
Logic Connectors:
- OR: Alternative strategies (attacker chooses the easiest).
- AND: Joint requirements (all must hold).
- SAND (Sequential AND): Ordered dependencies (Preconditions must be met before Execution).
Defense Placement: Defenses are modeled as nodes that increase the difficulty (metrics) of satisfying specific P or V nodes rather than removing them.

C. CVSS-Based Quantification

To quantify risk, the framework maps ADT leaf nodes to CVSS v3.1 exploitability vectors:

Leaf Mapping: Each attack step is mapped to a representative CVE (or analogous LLM-specific assumption) to extract metrics: Attack Vector (AV), Attack Complexity (AC), Privileges Required (PR), and User Interaction (UI).
Aggregation Logic:
- OR Nodes: $E_{OR} = \max(E_{child})$ (Attacker takes the easiest path).
- AND Nodes: $E_{AND} = \min(E_{child})$ (Hardest requirement dictates difficulty).
- SAND Nodes (P $\to$ V): The framework introduces a novel Majority Attack Complexity ( $AC_{maj}$ ) propagation. If preconditions create a "stable" environment (Low AC), the subsequent execution is easier; if preconditions are fragile (High AC), execution is harder. The execution node's AC is overridden by $AC_{maj}$ before calculating the path score.
Final Score: Path exploitability ( $E_{path}$ ) is aggregated up the tree. Impact (Confidentiality, Integrity, Availability) is applied only at the root goal node, preventing double-counting of consequences across shared subtrees.

D. Risk Treatment & Cost Modeling

Defense Transformation: Controls are modeled as transformations on CVSS metrics (e.g., MFA changes PR from Low to High; Input Sanitization changes AC from Low to High).
Cost Levels: Defenses are assigned ordinal cost levels (1–4) based on engineering effort, infrastructure footprint, operations load, and UX impact.
Scenario Analysis: The framework evaluates four canonical strategies:
1. Harden major preconditions only.
2. Harden all preconditions.
3. Harden execution only (guardrails).
4. Harden both sides.

3. Key Contributions

Goal-Driven System Modeling: A method to unify conventional, adversarial, and conversational threats into explicit multi-step attack paths using ADTs, moving beyond isolated threat enumeration.
CVSS-ADT Integration: A novel method to map CVSS v3.1 exploitability vectors onto ADT structures, including a specific logic for propagating "Attack Complexity" from preconditions to execution steps to reflect environmental changes.
Cost-Aware Defense Portfolio Comparison: A workflow that quantifies the reduction in path exploitability ( $\Delta E$ ) against implementation costs, enabling engineers to identify "choke points" where defenses yield the highest risk reduction per unit cost.
Healthcare Case Study: A comprehensive demonstration of the framework applied to an LLM healthcare assistant, modeling risks to medical procedures, EHR privacy, and service availability.

4. Results

Path Identification: The analysis revealed that diverse threats often consolidate into a small number of dominant paths and shared system choke points (e.g., the prompt channel, session management, and orchestration logic).
Quantitative Findings:
- Baseline attack paths for G1–G3 often converged to high CVSS base scores (around 7.5) due to network accessibility (AV:N) and low complexity (AC:L) in unmitigated systems.
- Diminishing Returns: The SAND aggregation ( $E_{path} = \min(E(P), E(V))$ ) revealed that hardening only one stage (e.g., preconditions) while leaving the other (execution) weak yields limited overall risk reduction.
- Optimal Strategies: "Bundling" precondition controls (Infrastructure-first) or combining precondition hardening with execution guardrails provided the most significant reductions in path exploitability.
- Cost Efficiency: The framework successfully identified that adding Human-in-the-Loop (HITL) gates (High Cost) was less effective for reducing exploitability than technical controls (like mTLS or RBAC) but served as a necessary governance layer for high-impact actions.

5. Significance

Bridging AI and Traditional Security: The framework bridges the gap between abstract AI security concerns and established vulnerability management practices (CVSS), allowing security teams to use familiar metrics for LLM systems.
Early-Stage Applicability: Unlike attack graphs requiring full deployment details, this ADT approach works with partial system knowledge, making it suitable for the design and early development phases.
Actionable Prioritization: By separating "how easy" (exploitability) from "how bad" (impact) and linking them to cost, the framework provides a practical, domain-agnostic workflow for prioritizing security investments in critical LLM deployments.
Interdisciplinary Collaboration: It offers a common language for security engineers, ML researchers, and software developers to discuss risks and defenses within a unified system model.

In conclusion, the paper provides a rigorous, reproducible methodology for assessing and treating risks in LLM-based systems, demonstrating that system-level risk management is essential for securing safety-critical AI applications.