HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases

Imagine you are trying to fix a massive, ancient, and incredibly complex clockwork machine. This machine isn't just a pile of gears; it's a hierarchy of gears, springs, and levers, all connected in a specific way. Now, imagine you have a brilliant but slightly confused assistant (an AI) who knows how to read instructions but has never seen this specific machine before.

When you ask the assistant, "Why is the clock ticking too fast?", a traditional AI might look at your question and say, "Ah, 'ticking' and 'fast'! Let me search the library for any book that mentions those words." It might pull up a book about a grandfather clock or a stopwatch, completely missing the fact that your machine is a custom-built, 10,000-part industrial engine.

This is the problem the paper HDLxGraph is solving. Here is the breakdown in simple terms:

The Problem: The "Lost in Translation" Moment

Hardware engineers write code (called HDL) to design computer chips. It's like writing a recipe for a robot.

The Issue: When engineers ask an AI for help, the AI often gets confused.
1. Structure Mismatch: Human questions are flat (like a sentence), but chip designs are deep and layered (like a family tree of modules, blocks, and signals). The AI tries to match words, not the structure.
2. Vocabulary Mismatch: Engineers use very specific technical words (like "clock enable" or "data flow") that sound nothing like the natural language humans use to describe them.

The Result: The AI looks in the wrong drawer. It might find a file named "Cache" because you asked about "memory," but the actual bug is in a completely different file called "Frontend" that controls how the memory talks to the processor.

The Solution: HDLxGraph (The "Smart Map" System)

The authors built a new system called HDLxGraph. Instead of just reading words, this system builds a 3D map of the entire codebase. Think of it as giving the AI two special tools:

The "Family Tree" (AST - Abstract Syntax Tree):
Imagine the code isn't a list of sentences, but a family tree.
- Grandparents: The big Modules (the whole chip).
- Parents: The Blocks (specific functions).
- Children: The Signals (the tiny wires connecting things).
- How it helps: When you ask a question, the AI doesn't just scan for keywords. It looks at the family tree. It understands that "Signal X" belongs to "Block Y," which belongs to "Module Z." This stops it from getting lost in a library of 10,000 files.
The "Flow Chart" (DFG - Data Flow Graph):
Imagine water flowing through pipes.
- In a chip, "data" flows like water. It goes from a source, through a valve, to a destination.
- How it helps: If a pipe is leaking (a bug), the AI traces the water backwards to find exactly where the leak started. It ignores the words on the pipes and follows the actual flow of information.

The New "Test Drive" (HDLSearch)

To prove their system works, the team realized there was no good "driver's license test" for AI to learn how to search through chip code. So, they built one called HDLSearch.

They took real-world, massive chip projects (like the brains of real computers).
They used AI to generate thousands of realistic questions and answers based on these projects.
This became the "exam" to see if their new system was actually smarter than the old ones.

The Results: Why It Matters

When they tested HDLxGraph against the best existing AI tools:

Search: It found the right code file 12% more often. (Imagine finding the right needle in a haystack 12% faster every single time).
Debugging: It fixed bugs 12% more accurately. (It stopped guessing and started tracing the actual problem).
Completion: It finished code snippets 5% better. (It knew what the engineer was trying to build before they finished typing).

The Big Picture Analogy

Old AI (Similarity-Based): Like a librarian who only matches the words on your request slip to the words on the book spines. If you ask for "a fast car," they might give you a book about a race car, even if you need a book about a delivery truck.
HDLxGraph: Like a master mechanic who has a blueprint of the entire factory. When you say "the truck is slow," they don't just look for the word "truck." They look at the blueprint, see the engine is connected to the transmission, and trace the power flow to find the broken belt.

In short: HDLxGraph teaches AI to stop just "reading" code and start "understanding" how the machine is built and how it moves. This makes it a much better partner for engineers designing the next generation of computers.

Here is a detailed technical summary of the paper "HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases."

1. Problem Statement

The paper addresses the limitations of using Large Language Models (LLMs) for Hardware Description Language (HDL) tasks (generation, debugging, and search) when relying on standard Retrieval-Augmented Generation (RAG) frameworks. The authors identify two fundamental mismatches between conventional semantic similarity-based RAG and HDL code:

Structural Mismatch: Natural language queries are flat and sequential, whereas HDL designs possess a deep, multi-level hierarchical structure (Modules $\rightarrow$ Blocks $\rightarrow$ Signals). Conventional RAG fails to capture these cross-file and cross-module dependencies, leading to poor recall in large repositories (thousands of lines).
Vocabulary Mismatch: HDL uses domain-specific terminologies (operators, keywords like always, assign, fence.i) that differ significantly from natural language descriptions. Standard semantic similarity models struggle to align natural language queries with these specific hardware semantics.

Existing software-code Graph RAG approaches are suboptimal because HDLs differ from software in abstraction hierarchy (modules vs. classes/functions) and behavioral modeling (concurrent execution vs. sequential control flow).

2. Methodology: HDLxGraph

The authors propose HDLxGraph, the first framework to integrate HDL-specific graph structures into RAG. The system operates in three main stages:

A. Graph Database Preparation

The framework constructs a dual-graph database from HDL repositories (specifically Verilog):

Abstract Syntax Tree (AST) Graph: Captures the hierarchical structural relationships.
- Nodes: Modules, Blocks (e.g., always, assign), and Signals.
- Edges: CONTAINS and INSTANTIATE.
- Purpose: Maps the multi-level entity hierarchy to align flat natural language queries with the structural reality of the hardware.
Data Flow Graph (DFG) Graph: Captures behavioral signal flow.
- Nodes: Signals and Temporary variables.
- Edges: FLOWS_TO, TRUE, FALSE, COND.
- Purpose: Represents the circuit topology and data propagation, addressing the vocabulary mismatch by tracing signal dependencies rather than keyword matching.
Embedding: Nodes are embedded using CodeT5+ to facilitate semantic search, and cross-file relationships are established via module instantiation analysis.

B. Multi-Level Retrieval

HDLxGraph employs a hybrid retrieval strategy tailored to the downstream task:

AST Retrieval (for Search):
1. Query Decomposition: An LLM ("Decomposer") breaks a natural language query into structural levels (Module, Block, Signal).
2. Top-k Selection & Filtering: Retrieves candidate nodes at each level based on semantic similarity and filters valid Module-Block pairs using containment relationships.
3. Cross-level Rerank: Reranks results by averaging similarity scores of parent nodes containing the target signal, ensuring fine-grained retrieval.
DFG Retrieval (for Debugging & Completion):
- Debugging: Uses Signal Traverse to iteratively trace upstream from an error signal to identify the specific dataflow divergence causing the bug, filtering out irrelevant code regions.
- Completion: Uses Graph Similarity (via GraphSAGE embeddings) to find code snippets with similar dataflow patterns, even if the syntax differs, enabling completion based on functional similarity.

C. HDLSearch Benchmark

To address the lack of HDL-specific search benchmarks, the authors created HDLSearch.

Source: Derived from 10 real-world, repository-level HDL projects (e.g., CVA6, mor1kx).
Generation: An automated pipeline generates queries by annotating functional blocks, propagating semantics to signals, and abstracting them into module-level descriptions. Ambiguity is introduced by removing specific names to simulate real-world user queries.
Scale: Contains 350 queries (50 module, 100 block, 200 signal level) with 6,300 code blocks as distractors.

3. Key Contributions

HDLxGraph Framework: The first RAG framework integrating AST (for structure) and DFG (for behavior) to bridge the gap between natural language and HDL semantics.
Hybrid Retrieval Mechanism: A novel approach that aligns flat queries with hierarchical structures (AST) and traces signal dependencies (DFG) to overcome structural and vocabulary mismatches.
HDLSearch Benchmark: The first dataset specifically designed for HDL code search, derived from real-world repositories to evaluate retrieval capabilities in isolation.
Comprehensive Evaluation: Demonstrated adaptability across three LLMs (Claude-3.5-Sonnet, Qwen2.5-Coder-7B, LLaMA-3.1) and three tasks (Search, Debugging, Completion).

4. Experimental Results

The framework was evaluated against state-of-the-art (SOTA) baselines: Similarity-based RAG (BM25, CodeT5+), Software Graph RAG (Microsoft's GraphRAG), and Accurate-RAG (human-extracted ground truth).

Code Search: HDLxGraph improved Mean Reciprocal Rank (MRR) by 12.04% over similarity-based RAG and 11.59% over software Graph RAG. It significantly outperformed baselines in block-level retrieval.
Code Debugging: Achieved an average improvement of 12.22% (ROUGE-L F1) over similarity-based RAG and 8.18% over software Graph RAG. It approached the performance of "Accurate-RAG" (human-extracted context).
Code Completion: Improved Pass@1 accuracy by 5.04% over similarity-based RAG and 4.07% over software Graph RAG.
Ablation Study: Confirmed that AST is critical for search and completion (structural alignment), while DFG is essential for debugging (tracing signal flow). Removing either component significantly degraded performance.

5. Significance

Domain-Specific Adaptation: The paper proves that generic software RAG solutions are insufficient for hardware design due to fundamental differences in concurrency and hierarchy. HDLxGraph provides a necessary domain-specific adaptation.
Scalability: By utilizing graph databases, the system can handle large, complex repositories (tens of thousands of lines) where traditional prompt-based or flat-text retrieval fails.
Foundation for Future Tools: The introduction of HDLSearch and the dual-graph approach sets a new standard for evaluating and building LLM agents for hardware design, potentially accelerating chip design cycles and reducing debugging time.

In conclusion, HDLxGraph successfully bridges the semantic gap between natural language and hardware design by leveraging the inherent graph nature of HDLs, offering a robust solution for LLM-assisted hardware engineering.