LLM-Augmented Knowledge Base Construction For Root Cause Analysis

This paper evaluates Fine-Tuning, RAG, and Hybrid LLM methodologies for constructing a Root Cause Analysis knowledge base from support tickets, demonstrating through experiments on a real industrial dataset that the resulting knowledge base effectively accelerates RCA tasks and enhances network resilience.

Nguyen Phuc Tran, Brigitte Jaumard, Oscar Delgado, Tristan Glatard, Karthikeyan Premkumar, Kun Ni

Published 2026-04-09
📖 5 min read🧠 Deep dive

Imagine you are the chief mechanic for a massive, futuristic city's transportation system. This system isn't just roads and cars; it's a complex web of digital highways, fiber-optic cables, and invisible signals that keep our world connected. When a traffic jam happens (a network outage), you need to fix it instantly. If you don't, the whole city grinds to a halt.

Traditionally, fixing these jams was like trying to solve a mystery by reading thousands of handwritten notes from the past. You'd have to sift through messy support tickets, logs, and expert notes to figure out: "Why did the lights go out? Was it a bad wire? A software glitch? Did someone trip a switch?" This was slow, prone to human error, and exhausting.

This paper introduces a new tool called TelcoInsight. Think of it as hiring a super-intelligent, tireless detective (an AI) who can read all those messy notes in seconds and organize them into a perfect "Cheat Sheet" for fixing future problems.

Here is how they built this detective and tested three different ways to train it:

The Three Training Methods

The researchers wanted to teach this AI detective how to solve network mysteries. They tried three different "training camps":

1. The "Memorization" Camp (Fine-Tuning)

  • The Analogy: Imagine taking a brilliant student who already knows how to read and write (a pre-trained AI) and forcing them to memorize a specific textbook of network problems until they know every page by heart.
  • How it works: You feed the AI thousands of past support tickets and say, "Learn these patterns."
  • The Result: The student gets very good at the style of the answers and the specific words used. However, if they see a brand-new type of problem they haven't memorized, they might make things up (hallucinate) because they are stuck in their "textbook" and can't look outside it for help.

2. The "Library" Camp (RAG - Retrieval-Augmented Generation)

  • The Analogy: Instead of memorizing the whole library, you give the student a magical index card. When a problem comes in, the student instantly runs to the library, finds the most similar past cases, reads them, and then writes the answer based on what they just found.
  • How it works: The AI doesn't memorize everything. Instead, it searches a database of past tickets for similar issues and uses those real examples to build its answer.
  • The Result: The answers are very accurate because they are based on real, recent evidence. However, the student might struggle with the specific "jargon" or shorthand used by the company's engineers because they haven't practiced speaking that specific language.

3. The "Hybrid" Camp (The Best of Both Worlds)

  • The Analogy: This is the ultimate detective. They have memorized the specific language and patterns of the network (from Camp 1) AND they have a magical index card to pull up fresh evidence from the library whenever they need it (from Camp 2).
  • How it works: The AI is first trained to understand the specific "dialect" of network engineers. Then, when a new problem arrives, it uses that knowledge to understand the question, but it also searches the library for the most relevant past cases to ensure the solution is factually correct.
  • The Result: This was the winner. It spoke the engineers' language fluently and gave answers backed by real-world proof. It was the most accurate and reliable method.

What Did They Build? (The Knowledge Base)

The goal wasn't just to have the AI talk; it was to turn all that messy data into a structured Knowledge Base.

Think of this as turning a pile of chaotic, handwritten detective notes into a clean, organized Rule Book.

  • Before: "The internet was slow on Tuesday. Maybe it was the router? Or maybe the software? I think we tried restarting it..."
  • After (The Rule Book): "IF the router shows 'Packet Loss' AND the time is between 3 PM and 4 PM, THEN the cause is 'High CPU' and the solution is 'Restart the Core Router'."

This rule book allows the network team to diagnose problems instantly, almost like a doctor looking at a symptom chart and immediately knowing the medicine.

Why Does This Matter?

  1. Speed: Instead of spending hours digging through logs, the system suggests the answer in seconds.
  2. Accuracy: It reduces human error. It doesn't forget the solution to a problem that happened two years ago.
  3. Privacy: The researchers made sure this AI could run locally (on the company's own servers), so sensitive customer data doesn't have to be sent to the public internet.
  4. Compression: They found a way to group similar problems together. If 50 different tickets all describe the same issue, the system compresses them into one single rule. This makes the rule book much shorter and easier to use.

The Bottom Line

The paper proves that the best way to fix complex network problems isn't just to give AI a brain (memorization) or just a library (searching). It's to give it both. By combining a specialized understanding of the industry with the ability to look up fresh facts, we can build a "Super-Mechanic" that keeps our digital world running smoothly, ensuring that when the lights go out, they come back on almost instantly.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →