From Threat Intelligence to Firewall Rules: Semantic Relations in Hybrid AI Agent and Expert System Architectures

Imagine you are the head of security for a massive, bustling city (the internet). Every day, criminals try to break in, but they are getting smarter, faster, and using high-tech tools to find the weakest locks on your doors.

Your job is to stop them. But here's the problem: You are drowning in paperwork. Every time a crime happens somewhere else in the world, a detective writes a long, messy report (a Cyber Threat Intelligence report) describing what the criminal did. These reports are full of jargon, confusing details, and noise.

You need to turn these messy reports into security rules (like "Block this specific street" or "Lock this specific window") instantly. If you wait too long to read the report and figure out what to do, the criminals have already broken in.

This paper is about building a super-smart robot assistant that can read these messy reports, understand the real meaning behind the words, and automatically write the security rules for you.

Here is how they did it, explained with simple analogies:

1. The Problem: The "Dictionary" vs. The "Context"

Most AI systems try to read a report and guess what it means by looking for specific keywords. It's like trying to understand a story by only looking for the word "dog." If the story says, "The canine chased the mailman," a simple AI might miss it because it didn't see the word "dog."

The authors realized that to understand security threats, you need to understand relationships between words, not just the words themselves. They focused on Hypernyms and Hyponyms.

The Analogy: Think of a family tree.
- Hypernym: The parent category (e.g., "Vehicle").
- Hyponym: The specific child (e.g., "Sports Car").
The Trick: If a report says a criminal used a "Trojan Horse," the AI needs to know that a "Trojan Horse" is a type of "Malware," which is a type of "Threat." By understanding this family tree, the AI can group similar threats together even if they use different words.

2. The Solution: A Two-Part Team (The Hybrid Agent)

The authors didn't just use one AI. They built a team with two distinct personalities working together:

Team Member A: The Creative Translator (The AI Agent)

This is a Large Language Model (LLM), like a very well-read but sometimes imaginative writer.

Its Job: It reads the messy detective report. Instead of just guessing, it plays a game of "Category and Sub-category." It asks: "What specific thing is this? What is the general family it belongs to?"
The Analogy: Imagine a translator who doesn't just translate word-for-word. Instead, they read a paragraph about a "red, fast, four-wheeled machine" and say, "Ah, this is a Sports Car." They strip away the noise and find the core concept.
The Innovation: They made this AI do this in three stages (like peeling an onion) to make sure it really understands the depth of the threat before moving on.

Team Member B: The Strict Accountant (The Expert System)

This is a traditional, rule-based computer program (called CLIPS). It is not creative; it is 100% logical and strict.

Its Job: It takes the "Sports Car" concept from the Translator and turns it into a strict, unbreakable security rule.
The Analogy: If the Translator says, "Block all Sports Cars," the Accountant checks the rulebook to make sure that's a valid legal order. It writes the actual code (the firewall rule) that the security system will execute.
Why two teams? The AI (Translator) is great at understanding messy human language but can sometimes "hallucinate" (make things up). The Accountant (Expert System) is boring but never lies. By combining them, you get the best of both worlds: Understanding + Reliability.

3. The Result: Faster and Smarter Defense

The researchers tested this system against other methods.

The Old Way: Traditional AI tried to guess the threat category directly. It often got confused by rare or weird threats (the "imbalanced data" problem).
The New Way: Their "Family Tree" method (using Hypernyms/Hyponyms) was much better at spotting the right threats, even when the data was messy or rare.

The Bottom Line:
Think of this system as a super-efficient security guard.

Old Guard: Reads a report, gets confused by the fancy words, and might miss the threat.
New Guard (This Paper): Reads the report, realizes, "Oh, this 'fancy word' is just a specific type of 'bad guy' I already know how to stop," and immediately writes a perfect rule to lock the door.

They proved that by teaching AI to understand the relationships between words (like a parent and child), rather than just the words themselves, we can build security systems that are faster, more accurate, and trustworthy enough to protect our digital cities.

Here is a detailed technical summary of the paper "From Threat Intelligence to Firewall Rules: Semantic Relations in Hybrid AI Agent and Expert System Architectures."

1. Problem Statement

The cybersecurity landscape faces a structural asymmetry: attackers can exploit a single vulnerability with AI-assisted automation, while defenders must protect entire systems against evolving threats. Current defenses, such as Intrusion Detection/Prevention Systems (IDS/IPS), rely heavily on manual configuration of security controls based on Cyber Threat Intelligence (CTI) reports.

Key challenges identified include:

Latency: High delays in countering threats due to the manual translation of unstructured CTI reports into actionable defensive rules.
Complexity & Noise: CTI reports are often verbose and noisy, making it difficult for traditional AI models to extract sensitive operational data accurately.
Data Imbalance: Cybersecurity datasets suffer from severe class imbalance (rare attack types vs. common ones), causing conventional AI methods to underperform.
Trustworthiness: Purely generative AI approaches (LLMs) risk "hallucinations," which are unacceptable in security contexts where incorrect firewall rules could block legitimate traffic or fail to stop attacks.

2. Methodology

The authors propose a Hybrid Neuro-Symbolic Architecture that integrates Agentic AI (LLMs) with a Symbolic Expert System (CLIPS). The pipeline, termed Semantic Information Flow (SIF), transforms raw CTI reports into executable firewall filtering rules (e.g., iptables).

Core Components:

Enhanced CoALA Agent (Neural Component):
- Role: Extracts semantic information from CTI reports.
- Mechanism: Instead of direct text-to-code mapping, the agent uses an iterative prompting strategy based on taxonomic relations (hypernyms and hyponyms).
- Process:
  1. Extraction: Identifies specific domain entities (e.g., specific malware IPs).
  2. Abstraction: Abstracts these entities into semantic categories using hyponym-hypernym relationships (e.g., mapping a specific IP to a broader "malicious actor" category).
  3. Generation: Uses these enriched representations to generate CLIPS code templates.
- Determinism: The system employs strict heuristics (fixed random seeds, greedy decoding, disabled CuDNN benchmarking) to ensure deterministic LLM inference, a critical requirement for reproducible security operations.
Expert System A (Verification Layer):
- Acts as a syntactic verification layer to detect and thin out potential LLM hallucinations before code generation.
Refinement Engine & Expert System B (Symbolic Component):
- Role: Maps the extracted semantic information to existing security controls.
- Mechanism: Uses CLIPS (C Language Integrated Production System), a rule-based forward-chaining inference engine.
- Output: Generates valid, syntactically correct filtering rules. The graph-based knowledge base allows for incremental updates, unlike decision-tree approaches.

3. Key Contributions

Taxonomic Semantic Extraction: A novel methodology leveraging hypernym-hyponym relations to enhance the semantic understanding of security events. This approach improves the mapping of unstructured text to defensive strategies compared to standard classification or single-pass extraction.
Hybrid Agentic Framework: The integration of an LLM-based agent with a symbolic expert system (CLIPS). This combines the flexibility of LLMs for semantic understanding with the determinism and reliability of rule-based systems for code generation.
Cognitive Psychology Integration: The architecture extends the CoALA framework by incorporating theories of Ebbinghaus's memory decay and Collins & Quillian's semantic networks to manage knowledge retention and semantic relationships within the agent.
Empirical Validation: A rigorous evaluation demonstrating that semantic retrieval strategies outperform baseline methods, particularly on imbalanced data.

4. Experimental Results

The study was evaluated on two datasets:

Dataset A (CTI-HAL): 81 human-annotated reports mapped to MITRE ATT&CK techniques.
Dataset B (CIS): 66 malware entries with analyst synopses and network artifacts.

Task A: Semantic Extraction & Classification (Dataset A)

Comparison: The proposed method (Hypernym-based prompting) was compared against static embeddings (Word2Vec, GloVe), contextual embeddings (SecureBERT), and traditional ML (SVM, Random Forest).
Performance:
- The proposed method achieved a Weighted F1 Score of 0.329, significantly outperforming the best baseline (SecureBERT + RF at 0.143) and the Chain-of-Thought (CoT) baseline (0.308).
- It demonstrated a ~7% gain in F1 Score over baselines, specifically excelling in handling minority classes (imbalanced data).
- Hyponyms vs. Hypernyms: Experiments showed that hyponyms (specific terms) were consistently more effective than hypernyms (general terms) for retrieval tasks.
- Selectivity: Lowering the confidence threshold (e.g., to 50%) enhanced performance, suggesting that broader retrieval captures more relevant context.

Task B: Code Generation & Rule Validation (Dataset B)

Evaluation: Cybersecurity experts qualitatively assessed the generated CLIPS/firewall rules.
Metrics: Inter-rater reliability was measured using Krippendorff's $\alpha$ , Cohen's $\kappa$ , and Spearman's $\rho$ .
Findings:
- Technical Correctness: Achieved the highest agreement ( $\alpha = 0.5768$ ), indicating the generated rules were syntactically valid and logically sound.
- Fidelity: High agreement on the rules' alignment with the original CTI reports.
- Scope Calibration: Strong consistency in the relative ranking of rule scope.

5. Significance and Conclusion

Bridging the Gap: The work successfully bridges the gap between unstructured threat intelligence and actionable, automated defense mechanisms.
Reliability in AI: By combining LLMs with symbolic reasoning (CLIPS), the system mitigates the risk of hallucinations, making AI viable for sensitive security operations where trust is paramount.
Handling Imbalance: The semantic extraction strategy proves robust against the data imbalance inherent in cybersecurity, a common failure point for standard deep learning models.
Future Impact: The study paves the way for autonomous incident response systems that can rapidly translate threat reports into firewall rules, reducing the time-to-mitigate for cyberattacks. Future work will focus on further improving LLM determinism and expanding the system's deployability in real-world environments.

From Threat Intelligence to Firewall Rules: Semantic Relations in Hybrid AI Agent and Expert System Architectures

1. The Problem: The "Dictionary" vs. The "Context"

2. The Solution: A Two-Part Team (The Hybrid Agent)

Team Member A: The Creative Translator (The AI Agent)

Team Member B: The Strict Accountant (The Expert System)

3. The Result: Faster and Smarter Defense

1. Problem Statement

2. Methodology

Core Components:

3. Key Contributions

4. Experimental Results

5. Significance and Conclusion

More like this

Explainable machine learning for predicting shellfish toxicity in the Adriatic Sea using long-term monitoring data of HABs

Talking like Piping and Instrumentation Diagrams (P&IDs)

SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models

IntrinsicWeather: Controllable Weather Editing in Intrinsic Space

Expert Evaluation of LLM World Models: A High-TcT_cTc​ Superconductivity Case Study

Expert Evaluation of LLM World Models: A High- $T_c$ Superconductivity Case Study