LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

The paper proposes LLM4Cov, an offline agentic learning framework that overcomes the high cost of execution feedback in hardware verification by introducing execution-validated data curation and worst-state-prioritized sampling, enabling a compact 4B-parameter model to achieve high testbench coverage that surpasses both its teacher and significantly larger models.

Hejia Zhang, Zhongming Yu, Chia-Tung Ho, Haoxing Ren, Brucek Khailany, Jishen Zhao

Published 2026-02-27
📖 5 min read🧠 Deep dive

Imagine you are trying to teach a robot how to write a complex recipe (a testbench) to test a new, very expensive machine (a hardware chip) before it is built.

If the robot makes a mistake in the recipe, the machine might break or behave strangely. To find out, you have to run a simulation. But here's the catch: running this simulation is like baking a cake that takes three hours to bake. You can't just bake a thousand cakes a day to see which one tastes best. It's too slow and expensive.

This is the problem the paper LLM4Cov solves. It teaches a small, smart robot how to write perfect recipes by learning from these slow, expensive "baking sessions" without wasting time.

Here is how they did it, using simple analogies:

1. The Problem: The "Expensive Taste Test"

In the old way, if you wanted to teach an AI to write these recipes, you might try to let it guess, run the simulation, see if it failed, and try again immediately (like online learning). But because the simulation takes so long, the AI would spend 99% of its time waiting for the oven to finish, not learning.

Also, if you just gave the AI a pile of "perfect recipes" written by a human expert to study, it would fail in real life. Why? Because the AI would never learn how to fix a broken recipe. It would only know what a perfect one looks like, not how to recover when things go wrong.

2. The Solution: The "Three-Stage Cooking School"

The authors built a system called LLM4Cov that acts like a smart cooking school. Instead of just memorizing recipes, the student robot learns by practicing on the worst possible scenarios and fixing them.

They use three clever tricks:

Trick A: The "Worst-Case Scenario" Drill

Imagine a cooking class where the teacher usually gives you perfect ingredients. But in this class, the teacher says: "Okay, let's look at the 10 recipes you tried yesterday. Which one was the absolute worst? The one that burned the cake?"

Instead of ignoring that burnt cake, the teacher focuses entirely on it. They say, "Let's take this burnt cake and figure out exactly how to fix it so it becomes a perfect cake."

  • In the paper: This is called Worst-State-Prioritized Sampling. The AI is forced to look at the test cases that failed the most (lowest coverage) and learn how to fix them. This teaches the AI how to recover from disasters, which is the most valuable skill.

Trick B: The "Staged Apprenticeship"

You can't teach a beginner to fix a burnt cake if they don't even know how to boil water yet. The system uses a Progressive Learning approach:

  • Stage 1 (The Beginner): The student robot tries to write a recipe. It fails. A super-smart "Master Chef" (a huge, powerful AI) looks at the failure and shows the student how to fix it. The student learns from the Master's corrections.
  • Stage 2 (The Intermediate): The student gets better. Now, the Master Chef stops helping as much. The student tries to fix its own mistakes. If it succeeds, great! If it fails, the Master Chef steps in again.
  • Stage 3 (The Master): The student is now so good that it can fix its own mistakes almost as well as the Master Chef. It learns to generate its own "perfect fixes" without needing the big teacher anymore.

This is like a video game where you start on "Easy Mode" with a guide, and slowly the guide disappears as you level up.

Trick C: The "Memoryless" Shortcut

Usually, when a robot tries to fix a recipe, it remembers every single thing it did in the past 100 steps. This makes the instructions huge and confusing.
The authors realized the robot doesn't need the whole history. It just needs to know: "Here is the current broken recipe, and here is the error message."

  • In the paper: They call this a Memoryless State Transition. It's like telling the robot, "Forget the last hour of chaos. Just look at the mess on the counter right now and clean it up." This makes the learning much faster and more focused.

3. The Amazing Result

The most surprising part of the paper is the size of the robot they used.

  • They used a small AI model (only 4 billion parameters).
  • They compared it to giant AI models (30 billion to 500 billion parameters).

The Result: The small, specialized robot trained with this "Worst-Case" method actually beat the giant, general-purpose robots.

  • The small robot achieved a 69.2% success rate in creating perfect test recipes.
  • The giant robots (without this special training) only got around 60%.

The Big Takeaway

You don't need a massive, expensive brain to solve hard problems. You just need the right training method.

By teaching a small AI to focus on its biggest failures, fixing them step-by-step, and ignoring unnecessary history, it becomes a master of hardware verification. It's like taking a small, sharp knife and sharpening it perfectly, rather than trying to use a giant, dull chainsaw.

In short: LLM4Cov teaches AI to learn from its mistakes in the most efficient way possible, turning a slow, expensive process into a fast, high-quality learning experience.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →