Proof Strategy Extraction from LLMs for Enhancing Symbolic Provers

This paper introduces Strat2Rocq, a framework that extracts and formalizes proof strategies from large language models as lemmas to augment symbolic provers like CoqHammer, significantly improving their automated theorem proving success rates while also benefiting LLM-based proof agents.

Jian Fang, Yican Sun, Yingfei Xiong

Published 2026-03-05
📖 5 min read🧠 Deep dive

Here is an explanation of the paper "Proof Strategy Extraction from LLMs for Enhancing Symbolic Provers" using simple language and creative analogies.

The Big Picture: The "Smart Intern" vs. The "Strict Librarian"

Imagine you are trying to solve a very difficult math puzzle. You have two tools:

  1. The Strict Librarian (Symbolic Prover): This is a computer program (like CoqHammer) that is incredibly logical, precise, and secure. It never makes mistakes, but it's a bit rigid. It follows rules step-by-step. If it doesn't have a specific rule in its library to solve a puzzle, it gets stuck, even if the answer is obvious to a human. It's like a librarian who will only find a book if you give them the exact ISBN number; if you describe the plot, they can't help.
  2. The Smart Intern (Large Language Model/LLM): This is an AI (like Claude or o3-mini) that is incredibly creative and intuitive. It can look at a puzzle and instantly say, "Oh, I know this! It's just like that other puzzle we solved last week!" It has a "gut feeling" for the solution. However, it can be unreliable (it might lie or make up facts), it's expensive to run, and you can't always trust it with your company's secret code.

The Problem:
The Librarian is safe and cheap but gets stuck easily. The Intern is smart but risky and expensive. Usually, to solve hard problems, you need to hire the Intern to do the work, which costs money and risks leaking secrets.

The Solution (Strat2Rocq):
The authors asked: "Can we take the Intern's 'gut feelings' and turn them into a new set of rules for the Librarian?"

They built a system called Strat2Rocq. It acts like a translator or a teacher. It watches the Smart Intern solve problems, figures out how it solved them, and then writes those "aha!" moments down as official, verified rules (lemmas) that the Strict Librarian can understand and use forever.


How It Works: The Two-Step Process

Step 1: The "Study Session" (Offline Mining)

Imagine the Strict Librarian is sitting in a quiet room with a stack of 2,000 math problems.

  1. The Intern Solves: The Smart Intern is brought in to solve these problems. It doesn't just give the answer; it explains its thought process in plain English (e.g., "First, I realized that multiplying by 1 doesn't change anything, so I can skip that step.").
  2. The Translation: The system listens to the Intern's explanation. It spots the clever shortcuts the Intern used.
  3. The Rulebook Update: The system takes those shortcuts and turns them into formal, mathematical laws (lemmas). It's like taking the Intern's "secret trick" and printing it on a page in the Librarian's official rulebook.
    • Example: If the Intern realizes that A + B + C is the same as C + B + A without needing to check every single letter, the system writes a new rule: "You can swap the order of addition."

This happens offline. It's a one-time setup. The Intern does the heavy lifting once, and the result is a permanent upgrade for the Librarian.

Step 2: The "Exam" (Online Proving)

Now, a user brings a new problem to the Strict Librarian.

  • Before: The Librarian tries to solve it using only its old, rigid rules. It gets stuck.
  • After: The Librarian opens its rulebook. Thanks to the "Study Session," it now has the new rules extracted from the Intern. It sees the new problem, applies the "Int's trick," and solves it instantly!

Crucially, no Intern is needed during the exam. The Librarian does all the work using the new rules. This means it's fast, cheap, and secure.


Why This Is a Big Deal

  1. It's Cheaper and Safer: You don't need to pay for expensive AI services every time you want to verify software code. You just use the upgraded Librarian.
  2. It Keeps Secrets Safe: Companies with confidential code (like banks or defense contractors) can't send their code to public AI servers. With this method, they can upgrade their local tools without ever sending their data to the cloud.
  3. It's a "Win-Win":
    • For the Librarian: It got smarter. The paper shows it solved 13.4% more problems than before.
    • For the Intern: Even the Intern got a boost! When the Intern tried to solve problems again, having access to these new rules helped it succeed 4% more often. It's like the Intern reading its own study notes and getting smarter.

The "Side Effect" Discovery

The researchers found something cool: The "tricks" they extracted weren't just for the Librarian. They were also helpful for the Intern itself. It turns out that even a super-smart AI can benefit from having a structured list of its own best strategies.

Summary Analogy

Think of Strat2Rocq as a Master Chef (the LLM) teaching a Robot Kitchen (the Symbolic Prover).

  • The Robot is great at chopping vegetables perfectly but doesn't know how to make a complex sauce.
  • The Master Chef makes the sauce, explaining the secret ingredients and timing.
  • The system records the Chef's recipe and programs it into the Robot.
  • Now, the Robot can make the sauce perfectly every time, without needing the Chef to be in the kitchen. The Robot is faster, cheaper to run, and the recipe stays safe in the kitchen.

The paper proves that by "distilling" the intelligence of AI into the rules of traditional software, we can make our tools safer, faster, and much more capable.