QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

The paper introduces QiMeng-CodeV-SVA, a specialized LLM trained on a novel data synthesis framework that leverages large-scale RTL and bidirectional translation to overcome data scarcity and semantic verification challenges, achieving state-of-the-art performance in generating SystemVerilog Assertions from natural language.

Yutong Wu, Chenrui Cao, Pengwei Jin, Di Huang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu

Published 2026-03-17
📖 5 min read🧠 Deep dive

Imagine you are building a incredibly complex, high-speed train system (a computer chip). Before you let any passengers on, you need to make sure the train never derails, never hits a signal, and always stops at the right station.

In the world of chip design, engineers write these safety rules in a very strict, technical language called SystemVerilog Assertions (SVAs). Think of SVAs as the "laws of physics" for the chip. If the chip breaks these laws, it's a disaster.

The Problem: The Language Barrier

The trouble is, human engineers think in plain English (e.g., "Make sure the counter stops when the button is pressed"), but the computer only understands the strict SVA code.

For years, people tried to use General-Purpose AI (like the smart chatbots you know) to translate English into these safety laws. But it was like asking a brilliant literature professor to perform brain surgery. They knew the words, but they didn't know the rules of the chip. They often wrote laws that sounded right but were actually nonsense, or laws that were too simple to catch real errors.

Also, there was a huge problem: Data Scarcity. To teach an AI to be a surgeon, you need thousands of real surgeries to study. But in chip design, there are very few examples of perfect English-to-SVA translations available.

The Solution: QiMeng-CodeV-SVA

The researchers in this paper built a new, specialized AI called CodeV-SVA. They didn't just give it a textbook; they built a massive, custom training camp using a clever three-step process.

Here is how they did it, using some everyday analogies:

1. The "RTL-Grounded" Factory (The Raw Materials)

Instead of waiting for humans to write perfect examples, the team looked at RTL code.

  • Analogy: Imagine you have a million blueprints for different houses (RTL code). You don't have the safety manuals yet, but you have the blueprints.
  • The Trick: They used a smart AI to look at these blueprints and guess what the safety rules should be. It's like looking at a blueprint of a bridge and asking, "What are the rules to keep this bridge from falling?"
  • The Result: They generated hundreds of thousands of potential safety rules.

2. The "Bidirectional" Mirror Test (The Quality Control)

This is the paper's most creative idea. How do you know if the AI's guess is actually correct?

  • The Problem: Sometimes an AI writes a rule that is technically "true" but useless. For example, if the rule is "The sky is blue OR the sky is not blue," it's always true, but it doesn't tell you anything about the bridge.
  • The Solution (Bidirectional Translation):
    1. Take the AI's generated rule (SVA).
    2. Ask the AI to translate it back into plain English.
    3. Ask the AI to translate that English back into a new rule.
    4. The Check: Does the new rule match the original rule?
  • The Metaphor: Imagine you tell a friend a secret. They whisper it to a second friend, who whispers it back to you. If the story comes back exactly the same, you know the message was clear. If the story changes (e.g., "The bridge is safe" becomes "The bridge is always safe"), you know the first friend misunderstood the nuance.
  • The Filter: They threw away any rules that got "mangled" in this translation loop. Only the perfect, clear rules survived.

3. The "Reasoning" Coach (The Final Polish)

Before training the final model, they added a "thinking step."

  • Analogy: Instead of just giving the answer, the AI was forced to write out its homework steps first. "First, I see a clock signal. Second, I see a reset button. Therefore, the rule must be..."
  • This helped the AI understand why a rule was correct, not just memorize the pattern.

The Result: A Specialist vs. A Generalist

They trained their new AI, CodeV-SVA, on this massive, high-quality dataset.

  • The Old Way: Using a general AI (like GPT-5 or DeepSeek) was like using a Swiss Army Knife to perform heart surgery. It worked okay, but it wasn't precise.
  • The New Way: CodeV-SVA is like a specialized heart surgeon. Even though it's smaller and cheaper to run than the giant general AIs, it performs better at writing these specific chip safety rules.

In the final tests:

  • CodeV-SVA beat the world's most expensive, powerful general AIs.
  • It caught more errors and wrote more accurate safety laws.
  • It proved that you don't need a giant brain if you have the right training data and a clever way to filter out the bad stuff.

Why This Matters

This paper shows that for highly specialized jobs (like designing computer chips), we don't need to wait for AI to become "super-intelligent" in everything. Instead, we can build specialized tools by teaching them with high-quality, self-generated data and rigorous "mirror tests." It's a smarter, cheaper, and more effective way to build the future of hardware.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →