LLM-Augmented Knowledge Base Construction For Root Cause Analysis

Imagine you are the chief mechanic for a massive, futuristic city's transportation system. This system isn't just roads and cars; it's a complex web of digital highways, fiber-optic cables, and invisible signals that keep our world connected. When a traffic jam happens (a network outage), you need to fix it instantly. If you don't, the whole city grinds to a halt.

Traditionally, fixing these jams was like trying to solve a mystery by reading thousands of handwritten notes from the past. You'd have to sift through messy support tickets, logs, and expert notes to figure out: "Why did the lights go out? Was it a bad wire? A software glitch? Did someone trip a switch?" This was slow, prone to human error, and exhausting.

This paper introduces a new tool called TelcoInsight. Think of it as hiring a super-intelligent, tireless detective (an AI) who can read all those messy notes in seconds and organize them into a perfect "Cheat Sheet" for fixing future problems.

Here is how they built this detective and tested three different ways to train it:

The Three Training Methods

The researchers wanted to teach this AI detective how to solve network mysteries. They tried three different "training camps":

1. The "Memorization" Camp (Fine-Tuning)

The Analogy: Imagine taking a brilliant student who already knows how to read and write (a pre-trained AI) and forcing them to memorize a specific textbook of network problems until they know every page by heart.
How it works: You feed the AI thousands of past support tickets and say, "Learn these patterns."
The Result: The student gets very good at the style of the answers and the specific words used. However, if they see a brand-new type of problem they haven't memorized, they might make things up (hallucinate) because they are stuck in their "textbook" and can't look outside it for help.

2. The "Library" Camp (RAG - Retrieval-Augmented Generation)

The Analogy: Instead of memorizing the whole library, you give the student a magical index card. When a problem comes in, the student instantly runs to the library, finds the most similar past cases, reads them, and then writes the answer based on what they just found.
How it works: The AI doesn't memorize everything. Instead, it searches a database of past tickets for similar issues and uses those real examples to build its answer.
The Result: The answers are very accurate because they are based on real, recent evidence. However, the student might struggle with the specific "jargon" or shorthand used by the company's engineers because they haven't practiced speaking that specific language.

3. The "Hybrid" Camp (The Best of Both Worlds)

The Analogy: This is the ultimate detective. They have memorized the specific language and patterns of the network (from Camp 1) AND they have a magical index card to pull up fresh evidence from the library whenever they need it (from Camp 2).
How it works: The AI is first trained to understand the specific "dialect" of network engineers. Then, when a new problem arrives, it uses that knowledge to understand the question, but it also searches the library for the most relevant past cases to ensure the solution is factually correct.
The Result: This was the winner. It spoke the engineers' language fluently and gave answers backed by real-world proof. It was the most accurate and reliable method.

What Did They Build? (The Knowledge Base)

The goal wasn't just to have the AI talk; it was to turn all that messy data into a structured Knowledge Base.

Think of this as turning a pile of chaotic, handwritten detective notes into a clean, organized Rule Book.

Before: "The internet was slow on Tuesday. Maybe it was the router? Or maybe the software? I think we tried restarting it..."
After (The Rule Book): "IF the router shows 'Packet Loss' AND the time is between 3 PM and 4 PM, THEN the cause is 'High CPU' and the solution is 'Restart the Core Router'."

This rule book allows the network team to diagnose problems instantly, almost like a doctor looking at a symptom chart and immediately knowing the medicine.

Why Does This Matter?

Speed: Instead of spending hours digging through logs, the system suggests the answer in seconds.
Accuracy: It reduces human error. It doesn't forget the solution to a problem that happened two years ago.
Privacy: The researchers made sure this AI could run locally (on the company's own servers), so sensitive customer data doesn't have to be sent to the public internet.
Compression: They found a way to group similar problems together. If 50 different tickets all describe the same issue, the system compresses them into one single rule. This makes the rule book much shorter and easier to use.

The Bottom Line

The paper proves that the best way to fix complex network problems isn't just to give AI a brain (memorization) or just a library (searching). It's to give it both. By combining a specialized understanding of the industry with the ability to look up fresh facts, we can build a "Super-Mechanic" that keeps our digital world running smoothly, ensuring that when the lights go out, they come back on almost instantly.

1. Problem Statement

Communications networks are critical to the digital world, yet achieving "five 9s" (99.999%) reliability is challenging due to increasing network complexity. When outages occur, Root Cause Analysis (RCA) is essential for rapid restoration. However, traditional RCA methods face significant bottlenecks:

Data Overload: Engineers must sift through massive volumes of heterogeneous data (support tickets, logs, KPIs).
Manual Effort: Existing processes rely heavily on manual analysis and expert intuition, which is time-consuming and prone to human error.
Semantic Limitations: Classical Machine Learning (ML) approaches often fail to capture the nuances of human language, synonyms, and contextual knowledge embedded in unstructured support tickets.
Privacy Concerns: Integrating Large Language Models (LLMs) directly into customer-side operations raises security and data privacy risks.

The core objective is to develop an automated system that transforms historical, unstructured support tickets into a structured RCA Knowledge Base (KB) consisting of association rules (Anomaly $\to$ Root Cause $\to$ Solution) to accelerate future incident resolution.

2. Methodology: TelcoInsight Framework

The authors propose TelcoInsight, a framework that leverages LLMs to automate KB construction. The methodology involves three distinct approaches evaluated on a real-world industrial dataset:

A. Data Pre-processing and Prompt Engineering

Dataset: The study uses 1,049 resolved support tickets containing free-text anomalies, logs, expert analysis, and solutions.
Tokenization & Chunking: Since LLMs have context window limits (e.g., 4,096 tokens), the authors designed Algorithm 1 to split long textual data into manageable chunks while preserving the prompt structure.
Prompt Design: Three specific prompts guide the LLM:
1. Anomaly Analysis: Extract anomalies and symptoms.
2. Root Cause Extraction: Identify causes and solutions.
3. Synthesis: Combine results into structured association rules.

B. Three Evaluation Approaches

Fine-Tuning (FT):
- Pre-trained models (LLaMA3, Gemma, Mistral, etc.) are fine-tuned using Low-Rank Adaptation (LoRA) and quantization (16-bit) to adapt to the specific domain.
- The model learns to map anomaly descriptions directly to RCA rules using a supervised learning approach (Input: Ticket $\to$ Output: Rule).
- Limitation: The model relies solely on internal weights and may hallucinate or lack access to external, up-to-date knowledge.
Retrieval-Augmented Generation (RAG):
- The system retrieves relevant historical tickets from a vector database based on semantic similarity (using Word2Vec embeddings) before generating a response.
- This provides the LLM with external context and evidence, reducing hallucinations.
- Limitation: May struggle with domain-specific terminology if the base model lacks the specific "language" of the network.
Hybrid Approach:
- Combines Fine-Tuning (to teach the model domain-specific language and abbreviations) with RAG (to retrieve relevant historical evidence).
- The model first processes the input through its fine-tuned weights, then augments the context with retrieved similar tickets, and finally generates the RCA rule.
- Consolidation: If multiple chunks are retrieved, Algorithm 2 consolidates the context before final generation.

C. Evaluation Metrics

The performance was measured using:

Lexical Similarity: Cosine Similarity, BLEU, ROUGE, METEOR.
Semantic Similarity: BERTScore (crucial for understanding meaning beyond exact word matches).

3. Key Contributions

TelcoInsight Framework: A novel pipeline for automating RCA knowledge base construction from unstructured support tickets.
Comparative Analysis: A rigorous evaluation of Fine-Tuning, RAG, and Hybrid approaches on real industrial data.
Privacy-Aware Deployment: The methodology emphasizes on-premise deployment of LLMs to safeguard sensitive customer data, avoiding public API reliance.
Rule Compression: The introduction of similarity thresholds to cluster similar anomalies, significantly reducing the size of the knowledge base while preserving coverage.

4. Results and Findings

The experiments were conducted on 1,049 tickets (80% training, 20% testing) across 13 network issue classes using various LLMs (Gemma, LLaMA3, Mistral, Phi-3, Falcon).

Hybrid Superiority: The Hybrid approach consistently outperformed both pure Fine-Tuning and pure RAG across almost all metrics and models.
- Example: Using Mistral-7B, the Hybrid approach achieved a BERTScore of 0.933, significantly higher than Fine-Tuning (0.844) and RAG (0.933, tied in this specific case but generally lower in semantic fluency).
- Reasoning: Fine-tuning alone lacked external context (leading to hallucinations on rare cases), while RAG alone lacked the specific domain vocabulary. The Hybrid approach bridged this gap.
Impact of Prompts: Using structured prompts increased Cosine Similarity from 0.208 to 0.589 and BLEU scores from 0.032 to 0.118, proving that prompt engineering is critical for guiding LLMs in technical domains.
Rule Compression: By applying a 70% similarity threshold, the system reduced the number of rules by 11.34% (from 1,048 to 930) without losing critical information. At a 60% threshold, reduction reached 50.43%.
Performance on Rare Anomalies: The Hybrid method generalized well for common anomalies but showed lower performance on very rare types (e.g., Type 12) due to specialized terminology not present in the training data.
Trade-offs: The Hybrid/RAG approaches incurred a latency increase of approximately 4 seconds per query due to the retrieval process, though GPU memory usage remained consistent.

5. Significance and Conclusion

This paper demonstrates that LLM-augmented knowledge bases are a viable and effective solution for automating Root Cause Analysis in telecommunications.

Practical Impact: The generated knowledge base allows for faster diagnosis of network incidents, moving from manual log analysis to automated rule matching.
Security: By advocating for local deployment and on-premise fine-tuning, the study addresses critical industry concerns regarding data privacy and security.
Future Directions: The authors suggest future work should focus on data augmentation (synthetic data, back-translation) to handle rare anomaly classes and further explore Knowledge Graph integration for structured data, though their current hybrid approach effectively handles unstructured ticket data without complex graph engineering.

In summary, the Hybrid approach (Fine-Tuning + RAG) represents the state-of-the-art for constructing actionable, secure, and accurate RCA knowledge bases from unstructured support tickets.

LLM-Augmented Knowledge Base Construction For Root Cause Analysis

The Three Training Methods

What Did They Build? (The Knowledge Base)

Why Does This Matter?

The Bottom Line

1. Problem Statement

2. Methodology: TelcoInsight Framework

A. Data Pre-processing and Prompt Engineering

B. Three Evaluation Approaches

C. Evaluation Metrics

3. Key Contributions

4. Results and Findings

5. Significance and Conclusion

More like this

The Stepwise Informativeness Assumption: Why are Entropy Dynamics and Reasoning Correlated in LLMs?

Depression Detection at the Point of Care: Automated Analysis of Linguistic Signals from Routine Primary Care Encounters

Hallucination as output-boundary misclassification: a composite abstention architecture for language models

Consistency-Guided Decoding with Proof-Driven Disambiguation for Three-Way Logical Question Answering

Temporally Phenotyping GLP-1RA Case Reports with Large Language Models: A Textual Time Series Corpus and Risk Modeling