A Tale of 1001 LoC: Potential Runtime Error-Guided Specification Synthesis for Verifying Large-Scale Programs

This paper introduces Preguss, a modular framework that combines static analysis with LLM-aided synthesis to automatically generate and refine interprocedural specifications, enabling highly automated verification of large-scale programs (over 1,000 lines of code) while significantly reducing human effort.

Zhongyi Wang, Tengjie Lin, Mingshuai Chen, Haokun Li, Mingqi Yang, Xiao Yi, Shengchao Qin, Yixing Luo, Xiaofeng Li, Bin Gu, Liqiang Lu, Jianwei Yin

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are the chief inspector for a massive, 1,000-story skyscraper (a large computer program). Your job is to ensure the building is safe: no floors will collapse, no elevators will fall, and no pipes will burst (these are "Runtime Errors").

Traditionally, you have two tools:

  1. The Metal Detector (Static Analyzer): It scans the whole building and beeps loudly whenever it thinks it sees a problem. But it's very noisy. It beeps at a harmless coat rack thinking it's a bomb. It beeps at a shadow thinking it's a hole. It gives you thousands of "false alarms."
  2. The Architect (The Human Expert): To prove the building is safe, you need blueprints (formal specifications) that explain exactly how every room works. But writing these blueprints for a 1,000-story building by hand takes years and costs a fortune.

The Problem:
Recently, we tried using a super-smart AI (a Large Language Model or LLM) to write these blueprints. But the AI has a short attention span. If you show it the whole 1,000-story building at once, it gets confused and gives up. Also, it doesn't know which blueprints are actually needed to stop the specific alarms the Metal Detector is making. It might write a blueprint for the roof when the alarm is actually about the basement plumbing.

The Solution: Preguss (The Smart Foreman)
The paper introduces Preguss, a new system that acts like a brilliant, organized foreman. It uses a strategy called "Divide and Conquer" to fix the problem.

Here is how Preguss works, using a simple analogy:

1. The "Divide" Phase: Sorting the Alarms

Instead of looking at the whole building at once, Preguss looks at the Metal Detector's list of beeps.

  • The Insight: It realizes that most beeps are just noise. It picks one specific beep (e.g., "Potential pipe burst in Room 402") and asks: "What is the absolute minimum set of rules needed to prove this specific room is safe?"
  • The Action: It breaks the massive building down into tiny, manageable "inspection units." It creates a to-do list, prioritizing the most critical rooms first. It doesn't try to read the whole building; it just focuses on the specific pipe and the room it's in.

2. The "Conquer" Phase: The AI's Targeted Mission

Now, Preguss sends the AI (the LLM) a very specific, small mission.

  • The Prompt: Instead of saying, "Write blueprints for the whole building," it says: "Here is the alarm about the pipe in Room 402. Here is the code for Room 402 and the room directly below it. Write a rule that proves the pipe won't burst."
  • The Magic: Because the AI only has to look at a small slice of the building, it doesn't get overwhelmed. It writes a perfect, tiny rule (a "precondition") that says, "As long as the water pressure is below 50 PSI, this pipe is safe."

3. The Feedback Loop: The "Correction" Mechanism

Sometimes, the AI writes a rule that is technically correct but too strict (e.g., "The water pressure must be exactly 0 PSI"). This would cause a new false alarm because real water pressure isn't zero.

  • The Fix: Preguss acts like a strict editor. It runs the AI's rule through a "logic checker." If the checker says, "This rule is too strict and will break the building," Preguss sends the rule back to the AI with a note: "You made this too strict. Try again, but be more flexible."
  • The AI tries again, gets it right, and the rule is added to the official blueprints.

4. The Result: A Safe Building

By repeating this process for every single alarm, Preguss builds a complete set of safety rules.

  • The Win: It turns a task that used to take humans years into a task that takes hours.
  • The Efficiency: In the paper's tests, Preguss reduced the human work required by 80% to 89%.
  • The Discovery: It didn't just fix false alarms; it actually found 6 real, hidden bugs in a real spacecraft control system that humans had missed!

Summary Analogy

Think of Preguss as a Sherlock Holmes for code.

  • Old Way: You hire a detective to read the entire encyclopedia of the city to find one missing sock. They get tired and give up.
  • Preguss Way: You tell the detective, "The sock was last seen near the bakery. Go look only at the bakery and the street next to it." The detective finds the sock instantly. If they find a clue that doesn't make sense, they ask for a second opinion, refine the clue, and move on.

Why this matters:
This paper proves that we can finally automate the safety checks for huge, complex software systems (like those in cars, planes, and satellites) without needing a team of 50 experts to spend years writing blueprints. It combines the "noise detection" of traditional tools with the "reasoning power" of modern AI, but with a smart strategy to keep the AI focused and effective.