Apply2Isar: Automatically Converting Isabelle/HOL Apply-Style Proofs to Structured Isar

Here is an explanation of the paper "Apply2Isar" using simple language and creative analogies.

The Problem: The "Scratchpad" vs. The "Manuscript"

Imagine you are a mathematician trying to solve a complex puzzle. You have two ways to write down your solution:

The "Apply-Style" (The Scratchpad): This is like a frantic, messy scratchpad. You shout commands at your computer: "Try this! No, try that! Okay, now this!" It works great while you are figuring things out because it's fast and flexible. However, once you are done, the notes are a disaster. If you or someone else tries to read them a week later, it's a nightmare. If you change one rule in the puzzle, the whole chain of commands might break in a way that makes no sense, and you have no idea why.
The "Isar-Style" (The Manuscript): This is like a beautifully written story or a formal manuscript. It reads like a logical essay: "First, we know X. Because of X, we can conclude Y. Therefore, Z is true." It is easy to read, easy to understand, and very sturdy. If you change a rule, the story breaks in a way that is obvious and easy to fix.

The Dilemma:
Most people prefer writing the messy "scratchpad" notes first because it's faster to explore ideas. But the "Isar manuscript" is what everyone actually wants to read and keep forever. The problem is that turning the messy scratchpad into a clean manuscript is incredibly hard work. You have to manually rewrite every single step, which is boring and prone to human error.

The Solution: Apply2Isar (The Magic Translator)

The authors of this paper built a tool called Apply2Isar. Think of it as a smart translator or a ghostwriter.

How it works: You feed it your messy "scratchpad" code (the apply-style proof). The tool runs through your code, step-by-step, watching exactly what happens to the puzzle pieces. It records every intermediate state.
The Magic: It then takes that record and writes a brand new, clean "manuscript" (the structured Isar proof) that does the exact same thing but in a readable format.
The Result: You get the best of both worlds. You can do your messy, fast exploration, and then hit a button to get a clean, professional proof that is easy for humans to read and hard to break.

Why Is This Hard? (The Tricky Parts)

The paper explains that this isn't just a simple "find and replace" job. It's like trying to translate a stream-of-consciousness diary entry into a formal legal contract. Here are the specific hurdles they had to overcome:

The "Backwards" vs. "Forwards" Problem:
- The Scratchpad works backwards. It starts with the answer and asks, "What do I need to get here?"
- The Manuscript works forwards. It starts with what we know and builds up to the answer.
- The Fix: The tool has to reverse-engineer the logic. It's like watching a movie in reverse and then rewriting the script so it plays forward naturally.
The "Multiple Goals" Mess:
- Sometimes, one command in the messy code solves three different problems at once. In the clean manuscript, you can't just say "Solved everything." You have to explicitly say, "Here is how we solved Problem A, here is Problem B, and here is Problem C."
- The Fix: The tool is smart enough to know which problems changed and only writes out the steps for those, avoiding a boring, repetitive list.
The "Shadow" Problem:
- Imagine you have a variable named "x" in your messy notes. Later, you create a new "x" inside a specific section. In the messy notes, the computer knows they are different. But when the tool tries to write the clean story, it might get confused and think they are the same person, causing a mix-up.
- The Fix: The tool has a "renaming" feature to make sure every character in the story has a unique name so the plot doesn't collapse.
The "Black Box" Problem:
- Sometimes the messy code uses a "magic spell" (a complex command) that does a lot of things at once. The tool can't always see inside the spell to know exactly how it worked.
- The Fix: If the tool gets stuck, it leaves a small note saying, "We used a magic spell here," so the proof still works, even if that specific part isn't fully translated yet.

Did It Work? (The Results)

The team tested this tool on thousands of real-world proofs from a massive library of mathematical proofs (the Isabelle Archive of Formal Proofs).

Success Rate: It successfully converted 95% to 99% of the messy proofs into clean ones.
Partial Success: Even when it couldn't convert 100% of a proof (usually because of those "magic spells"), it still converted the vast majority of it, leaving only tiny gaps.
Speed: It did all this in seconds, saving humans hours of tedious rewriting.

The Bottom Line

Apply2Isar is a bridge between the chaotic, fast-paced world of "figuring things out" and the orderly, robust world of "writing things down."

It allows mathematicians and computer scientists to stop wasting time manually rewriting their own work. Instead, they can focus on the hard thinking, let the messy code do the heavy lifting, and let the tool generate the beautiful, readable proof for them. It's like having a personal editor that instantly turns your rough draft into a published book.

Here is a detailed technical summary of the paper "Apply2Isar: Automatically Converting Isabelle/HOL Apply-Style Proofs to Structured Isar."

1. Problem Statement

Isabelle/HOL supports two primary proof styles:

Apply-style (Procedural): Scripts that work backward from the goal, invoking methods to manipulate the proof state. They are efficient for rapid exploration and search but are often fragile, difficult to read, and hard to maintain. A single change in a lemma or simplification rule can break a script far downstream without clear indication of the cause.
Structured Isar (Declarative): Proofs that explicitly state intermediate goals and reasoning steps, moving forward. They are highly readable, robust against changes in automation, and preferred by the Isabelle community (e.g., the Archive of Formal Proofs, AFP).

The Challenge: While Isar is superior for maintenance, users often find apply-scripts easier to write for initial exploration. Manually converting a complex apply-script to Isar is laborious, requiring the user to step through the script, record intermediate states, and reconstruct the proof manually. There was no automated tool to bridge this gap, forcing users to choose between rapid prototyping and long-term maintainability.

2. Methodology

The authors introduce Apply2Isar, an Isabelle/ML tool that automatically translates apply-style scripts into structured Isar proofs. The core methodology involves:

AST Parsing and Replay: The tool tokenizes and parses the input apply-script into a custom Abstract Syntax Tree (AST). It then "replays" the script by invoking the corresponding Isabelle/ML functions to simulate the execution of each command.
State Tracking: During replay, Apply2Isar records the intermediate proof states (specifically the list of pending goals) after every method application.
Reverse Construction: Since apply-scripts reason backward (Goal $\to$ Subgoals) while Isar reasons forward (Facts $\to$ Conclusion), the tool constructs the Isar proof in reverse order. It prints the intermediate goals and the methods used to discharge them, effectively reversing the flow of the original script.
Fact Management: To connect steps, the tool uses the fact method. Intermediate goals are labeled (e.g., h_1_1) to allow subsequent steps to reference them explicitly.

Key Implementation Strategies for Structural Differences:

Multiple Goals: To avoid verbosity when a method (like auto) affects multiple goals simultaneously, the tool implements a smart_goals option. Instead of listing all goals at every step, it only prints the goals that changed, deferring the combination of remaining goals to a final show statement.
Subgoals and Fact Ordering: Commands like subgoal (which focuses on a specific goal) and supply (which names facts) create ordering dependencies. Since Isar requires facts to be defined before use, Apply2Isar lifts subgoal blocks and note commands to the beginning of the proof block to ensure logical validity, even if this disrupts the original narrative flow.
Handling unfolding and using: The tool manages the interaction between these commands and apply by grouping their effects. It offers a smart_unfolds option to combine these effects with the subsequent method application to reduce repetition.
Variable Shadowing: The tool addresses the issue of variable shadowing in subgoal commands (where a new local variable might share a name with an existing one). It offers a subgoal_fix_fresh option to rename variables to prevent conflicts, though this requires careful handling of explicit references.
Schematic Variables: If a proof contains schematic goals (unknowns to be instantiated later), the tool cannot fully translate that section. It collects commands until the schematic goals are resolved and wraps the unresolved section in a single have statement with an embedded apply-script, marking the result as a "partial translation."

3. Key Contributions

Automated Translation Tool: The first tool capable of converting arbitrary apply-style scripts into structured Isar proofs within the Isabelle/HOL environment.
Robustness via Replay: By replaying the script and capturing actual proof states, the tool ensures the generated Isar is logically valid and faithful to the original execution path, rather than relying on heuristic pattern matching.
Handling Complex Edge Cases: The tool successfully manages complex interactions between commands (e.g., back, subgoal, unfolding, using) and provides configuration options (e.g., named_facts, smart_goals, print_types) to balance readability with fidelity to the original script.
Partial Translation Support: The tool gracefully handles proofs that cannot be fully converted (e.g., due to schematic variables or unsupported commands) by isolating the problematic sections, allowing the rest of the proof to be converted.

4. Evaluation Results

The authors evaluated Apply2Isar on a benchmark set of 4,461 apply-style proofs drawn from five major AFP entries (Group-Ring-Module, AutoCorres2, Flyspeck-Tame, Valuation, and BNF_Operations).

Success Rate: The tool successfully produced total or partial translations in 95%–99% of test cases across the different entries.
Failure Modes:
- Print Fails (~1-2%): Failures caused by inconsistencies in Isabelle's term-printing mechanism (re-parsing a printed goal resulted in a syntax error or different term). This is a limitation of Isabelle itself, not the tool.
- Timeouts: Caused by extremely slow re-parsing of complex goals when type annotations were required.
- Miscellaneous: Rare failures involving edge-case behaviors of uncommon methods.
Performance: The tool is generally performant, with a 30-second timeout set per proof. It successfully handled the longest script in the benchmark, which contained over 350 commands.

5. Significance

Bridging the Gap: Apply2Isar allows users to enjoy the rapid exploration capabilities of apply-scripts while retaining the long-term maintainability and readability of structured Isar.
Maintainability: It significantly reduces the manual effort required to refactor legacy code or experimental proofs into the standard format preferred by the Isabelle community and the AFP.
Determinism vs. LLMs: Unlike recent LLM-based approaches (e.g., Isabelle Assistant), Apply2Isar is completely deterministic. It does not rely on probabilistic generation, ensuring that the output is a faithful, reproducible transformation of the input.
Local Execution: The tool runs locally on standard hardware, avoiding the need for cloud subscriptions or external API dependencies.

In conclusion, Apply2Isar is a practical, high-utility tool that enhances the Isabelle/HOL ecosystem by automating the transition from procedural to declarative proofs, thereby improving the robustness and accessibility of formal verification developments.

Apply2Isar: Automatically Converting Isabelle/HOL Apply-Style Proofs to Structured Isar

The Problem: The "Scratchpad" vs. The "Manuscript"

The Solution: Apply2Isar (The Magic Translator)

Why Is This Hard? (The Tricky Parts)

Did It Work? (The Results)

The Bottom Line

1. Problem Statement

2. Methodology

3. Key Contributions

4. Evaluation Results

5. Significance

More like this

Monotone Comparative Statics without Lattices

Motion Illusions Generated Using Predictive Neural Networks Also Fool Humans

Performance Analysis of IEEE 802.11p Preamble Insertion in C-V2X Sidelink Signals for Co-Channel Coexistence

Construction of time-varying ISS-Lyapunov Functions for Impulsive Systems

Real-Time BDI Agents: a model and its implementation