IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking

This paper presents IC3-Evolve, an automated offline framework that leverages an LLM to iteratively evolve IC3 hardware model checking heuristics through strictly proof- and witness-gated validation, resulting in a standalone, high-performance checker with zero runtime ML overhead that reliably discovers practical improvements on standard benchmarks.

Mingkai Miao, Guangyu Hu, Ziyi Yang, Hongce Zhang

Published 2026-04-07
📖 5 min read🧠 Deep dive

Imagine you are trying to solve a massive, incredibly complex maze. This maze represents a piece of computer hardware (like a microchip). Your goal is to prove that there is no path through the maze that leads to a "trap" (a safety failure). If you find a path to the trap, you need to show exactly how to get there so engineers can fix it.

This is the job of a Model Checker, and the most famous algorithm for doing this is called IC3.

The Problem: The "Tuning" Nightmare

Think of IC3 as a very smart, but slightly stubborn, robot. It knows the rules of the maze, but it has to decide how to explore it. Should it check the left path first? Should it guess? Should it forget old paths to save memory?

These decisions are called heuristics. Currently, human experts have to manually tweak these settings. It's like trying to tune a race car engine by turning tiny screws with a screwdriver while the car is moving. It's slow, expensive, and if you turn one screw too far, the car might crash (the software stops working correctly).

The Solution: IC3-Evolve

The paper introduces IC3-Evolve, a new way to fix this robot. Instead of a human turning the screws, they use an AI (a Large Language Model) to write code patches.

But here is the catch: AI can be hallucinatory or make mistakes. If you just let an AI rewrite the robot's brain, it might break the rules of logic, and the robot might say "I'm safe!" when it's actually in a trap. That would be disastrous for hardware safety.

The Secret Sauce: The "Proof Gate"

This is where the paper gets clever. They don't just let the AI edit the code and hope for the best. They use a Proof-and-Witness Gate.

Think of the AI as a Chef trying to improve a recipe for a cake.

  1. The Proposal: The Chef (the AI) suggests a small change: "Let's add a pinch of salt to the frosting."
  2. The Gatekeeper: Before anyone eats the cake, a strict Taste-Tester (the Gate) checks two things:
    • If the cake is supposed to be "Safe" (no traps): The Chef must provide a Certificate (a mathematical proof) that the cake is definitely safe. The Gate checks this certificate with a different, independent tool. If the proof doesn't hold up, the cake is thrown away.
    • If the cake is supposed to be "Unsafe" (a trap exists): The Chef must provide a Replayable Trace (a video showing exactly how to get to the trap). The Gate watches the video to make sure the trap is real. If the video is fake or the trap isn't there, the cake is thrown away.

Crucially: The AI is only used in the kitchen (offline) to write the recipe. Once the recipe is approved and the cake is baked, the final product is just a regular cake. You don't need the AI or the Gate to eat it later. This means the final software is fast, cheap, and doesn't need an internet connection or a supercomputer to run.

How They Do It: The "Slot" System

To keep the AI from getting confused, they don't let it rewrite the whole code at once. They use a "Slot-Restricted" approach.

Imagine the robot's brain is a giant control panel with 100 different dials.

  • Compass Mode: The AI is only allowed to turn one dial at a time (e.g., "How fast should I check the left path?").
  • Jump Mode: If the AI gets stuck, it's allowed to turn two or three dials at once to see if they work better together.

After every change, the Gate checks if the robot still works correctly. If it does, and it's faster, the new setting is kept. If not, it's reverted.

The Results

They tested this on a huge collection of standard hardware puzzles (the HWMCC benchmarks).

  • The Result: The AI-evolved robot solved significantly more puzzles and did them much faster than the best human-tuned robots.
  • The Surprise: They found that you can't just fix one dial at a time and expect a miracle. The dials are all connected. The AI had to learn how to tweak multiple dials together to get the best performance.

Why This Matters

  1. Safety First: Because of the "Proof Gate," we know for a fact the new software is mathematically correct. No "AI hallucinations" allowed.
  2. No Runtime Cost: The final software doesn't need the AI running in the background. It's a standalone, super-fast tool.
  3. Automation: It automates the tedious job of tuning complex algorithms, freeing up human experts to do more creative work.

In a nutshell: IC3-Evolve is like having an AI mechanic who suggests tiny, specific tweaks to a race car engine. But before any tweak is installed, a strict inspector verifies that the car still follows the laws of physics and can prove it. Once verified, the car runs faster than ever, without needing the mechanic or the inspector to be present during the race.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →