iScript: A Domain-Adapted Large Language Model and Benchmark for Physical Design Tcl Script Generation

This paper introduces iScript, a domain-adapted Qwen3-8B model and its corresponding benchmark for generating reliable Innovus Tcl scripts, which leverages a novel multi-stage data synthesis pipeline and a two-step verification framework to overcome data scarcity and outperform state-of-the-art LLMs in physical design automation.

Ning Xu, Zhaoyang Zhang, Senlin Shu, Lei Qi, Jiaqi Lv, Wensuo Wang, Tianhao Zhao, Chao Zhang, Zhaoliang Yang, Xiangyu Li, Zhaorui Su, Jingshan Li, Xin Geng

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine you are building a massive, incredibly complex skyscraper. In the world of computer chips (which are like tiny, microscopic cities), this "skyscraper" is the physical design of a microchip. To build it, engineers use a special language called Tcl (Tool Command Language). Think of Tcl as the instruction manual or the "recipe" that tells the construction robots exactly where to place every single brick, wire, and room in the chip.

For decades, writing these recipes has been a nightmare. It requires thousands of lines of code, and if you make a tiny mistake, the whole building collapses (or the chip doesn't work). General AI models (like the ones that write poems or answer trivia) are terrible at this because they haven't seen enough of these specific "recipes," and they don't understand the strict rules of chip construction.

Here is what the iScript paper does, explained simply:

1. The Problem: The "Chef" Who Doesn't Know the Kitchen

Imagine you hire a world-famous chef (a general AI) to cook a very specific, obscure dish from a remote village. The chef has never seen the ingredients, doesn't know the local spices, and has never read the village's secret cookbook. If you ask them to cook it, they will guess, and the result will likely be inedible.

  • The Reality: General AI models try to write chip scripts, but they fail because chip data is secret (proprietary), rare, and uses very specific, weird commands that the AI has never learned.

2. The Solution: Training a "Specialist Chef" (iScript)

The authors created iScript, a specialized AI chef trained specifically for chip building. They didn't just give the AI a few recipes; they built a massive training program.

  • The "Synthesis Pipeline" (The Cooking School): Since there weren't enough real recipes to teach the AI, they built a factory to create them.

    • Step 1: They took a list of all the valid "ingredients" (commands) and mixed them together randomly to create thousands of fake recipes.
    • Step 2: They ran these fake recipes through a "grammar police" (a syntax checker) to throw out any that were nonsense.
    • Step 3 (The Magic): They used a super-smart AI (a "Teacher") to look at the valid recipes and ask, "What was the chef trying to do here?" The Teacher then wrote a story (called Chain-of-Thought) explaining the logic behind the recipe.
    • Result: They ended up with 10,000 high-quality examples of: The Request ("Build a clock tower") + The Reasoning ("First, I need to lay the foundation, then add the gears...") + The Code (The actual Tcl script).
  • The Training: They took a smart base AI (Qwen3-8B) and taught it in two phases:

    1. Language Immersion: Learning the specific vocabulary and grammar of chip scripts.
    2. Logic Training: Learning why to write the code, not just what to write, using the "stories" (Chain-of-Thought) they generated.

3. The Test: The "Driving License" Exam (iScript-Bench)

Before you can drive a truck, you need a test. But how do you test an AI's ability to write chip code without actually building a real chip (which costs millions of dollars)?

The authors created iScript-Bench, a standardized driving test with three levels:

  • Level 1 (The Parking Lot): Simple tasks, like "Turn on the lights."
  • Level 2 (The City Streets): Combining commands, like "Drive to the store and park."
  • Level 3 (The Highway): Complex logic, like "Navigate traffic while avoiding potholes and changing lanes."

They tested iScript against other famous AIs (like GPT-4, Gemini, and Claude). iScript won easily, especially in the complex tasks where the others failed completely.

4. The Grading System: The "Two-Step Check"

How do you grade the exam without a real chip?

  1. The Grammar Check: First, they run the code in a tiny, safe "sandbox" (a simulation). If the code has a typo or a syntax error, it fails immediately.
  2. The Logic Check: If the code passes the grammar check, a second AI (the "Proctor") reads the code and the original request. The Proctor checks: "Does this code actually do what the user asked?"
    • Cool Trick: They proved this AI Proctor is almost as good as a human expert, but much faster.

The Big Takeaway

This paper is like saying: "We can't just ask a general smart person to build a rocket ship. We need to train a specialist using a factory that creates practice problems, and then test them with a rigorous exam."

iScript is that specialist. It proves that if you give an AI the right training data (synthesized from scratch) and the right way to think (Chain-of-Thought), it can master the incredibly difficult task of writing the code that builds our modern world's computer chips.

In short: They taught an AI to speak "Chip Engineer" fluently, created a test to prove it works, and showed that a specialized AI is far better at this job than a general one.