CyberSleuth: Autonomous Blue-Team LLM Agent for Web Attack Forensics

This paper introduces CyberSleuth, an autonomous multi-agent LLM system that automates web attack forensics by analyzing network traces to identify compromised services and map exploits to specific CVEs, achieving 80% accuracy and demonstrating that simple orchestration with specialized agents outperforms complex hierarchical designs in generating expert-validated forensic reports.

Stefano Fumero, Kai Huang, Matteo Boffa, Danilo Giordano, Marco Mellia, Dario Rossi

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are a detective trying to solve a crime that happened in a digital city. The crime scene is a messy pile of evidence: millions of tiny digital footprints (network traffic) left behind by a hacker.

In the past, a human detective had to sift through this mountain of paper, read every single note, cross-reference them with old case files, and write a report. It took days, was exhausting, and humans often missed clues or got tired.

CyberSleuth is a new kind of "AI Detective" designed to do this job automatically. It's not just a chatbot that answers questions; it's an agent that can think, use tools, and investigate on its own.

Here is how the paper explains this new detective, broken down into simple concepts:

1. The Problem: The "Needle in a Haystack"

When a hacker attacks a website, they leave behind a massive log of data (called a PCAP file). It's like a 100-page transcript of a conversation where the hacker and the server are talking.

  • The Old Way: A human reads the whole transcript, highlights the suspicious parts, looks up what those words mean in a dictionary of known crimes (CVEs), and writes a report.
  • The New Way: CyberSleuth reads the transcript, figures out who the criminal is, what tool they used, and whether they succeeded, all in minutes.

2. The Detective's Toolkit: Three Different Architectures

The researchers tried three different ways to build this AI detective to see which one worked best. Think of these as different management styles for a police team:

  • The "Lone Ranger" (Single Agent): One smart detective tries to do everything alone. They read the whole file, search the internet, and write the report.
    • Result: This detective gets overwhelmed. They read too much, forget the beginning of the file by the time they reach the end, and get confused. They often guess wrong.
  • The "Bureaucratic Boss" (Tshark Expert Agent): The main detective is a boss who gives orders to a specialist (a "Tshark" expert) to look at specific parts of the file.
    • Result: This is better, but the boss and the specialist often misunderstand each other. The boss asks for "everything," and the specialist gets lost in the details. They talk past each other.
  • The "Specialized Task Force" (Flow Reporter Agent - FRA): This is the winner. The team is split up perfectly:
    1. The Summarizer: A fast worker who scans the whole file and creates a short, easy-to-read summary of the "suspicious" parts.
    2. The Investigator: The main detective reads only that summary. They don't get bogged down in the raw data.
    3. The Librarian: A tool that instantly looks up the summary on the internet to match it with known criminal profiles (CVEs).
    • Result: This team works like a well-oiled machine. The Investigator stays focused, the Summarizer handles the heavy lifting, and the Librarian provides the facts.

3. The "Memory" Problem: The Sticky Note vs. The Filing Cabinet

AI models have a limit on how much text they can remember at once (like a sticky note that only fits 5 words). If a case is long, the AI forgets what happened at the start.

  • The Solution: CyberSleuth uses a "Filing Cabinet" (a vector database). As it investigates, it writes down key clues on index cards and files them away. When it needs to remember something from 10 steps ago, it pulls the right card out of the cabinet. This allows it to solve long, complex cases without losing its train of thought.

4. The "Web Search" Skill: Not Just Guessing

A common mistake AI makes is "hallucinating" (making things up). If the AI sees a strange code, it might guess, "Oh, that's probably the 'Great Firewall' bug!" even if it's wrong.

  • CyberSleuth's Trick: It is programmed to say, "I don't know yet. Let me check the internet." It searches for the specific service and the type of attack to find the exact match. It treats the internet like a giant library of criminal records to verify its findings.

5. The Results: How Good is It?

The researchers tested CyberSleuth on 30 real-world scenarios (some old, some brand new from 2025).

  • Accuracy: It correctly identified the hacker's target and the specific "weapon" (vulnerability) used in 80% of the cases.
  • Human Approval: They showed the reports to 25 real cybersecurity experts. The experts rated the reports as complete, useful, and logical. They said the AI sounded like a competent junior analyst.
  • Adaptability: The best part? The researchers didn't have to rebuild the detective. They just gave it a new instruction: "Now, look at this traffic from a virus-infected computer instead of a website." The same team of AI agents successfully solved those cases too!

The Big Takeaway

The paper proves that AI can be a great partner for cybersecurity, but only if you design it right.

  • Don't give one AI too many jobs (it gets confused).
  • Give it a team of specialists (it works better).
  • Give it a filing cabinet for its memory (it doesn't forget).

CyberSleuth is the first step toward a future where AI handles the boring, tedious digging through data, leaving human experts free to focus on the big picture and stopping the next big attack.