LeanBET: Formally-verified surface area calculations in… — Plain-Language Explanation

Original authors: Ejike D. Ugwuanyi, Colin T. Jones, John Velkey, Tyler R. Josephson

Published 2026-05-18

📖 5 min read🧠 Deep dive

Original authors: Ejike D. Ugwuanyi, Colin T. Jones, John Velkey, Tyler R. Josephson

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to measure the surface area of a sponge, but the sponge is made of invisible, microscopic holes. Scientists use a method called BET (named after three scientists) to estimate this area by watching how gas sticks to the sponge. It's a standard tool in chemistry, but it's a bit like trying to solve a puzzle where the picture on the box is blurry.

Here is the problem: To get the answer, scientists have to pick a specific range of data points from their experiment and draw a straight line through them. The problem is, different people (or different computer programs) might pick slightly different ranges. One person might say, "Let's use the middle 10 points," while another says, "No, use the middle 12." This leads to different answers for the same sponge, causing confusion and a lack of trust in the results.

To fix this, a team created a computer program called BETSI that automatically checks every possible range of data to find the "best" one. It's like having a robot that tries every possible combination of puzzle pieces to find the one that fits perfectly. However, even robots can have bugs, or hidden assumptions that make them wrong in subtle ways.

Enter "LeanBET": The Math-Proofed Robot

The authors of this paper built a new version of this robot using a special computer tool called Lean 4. Think of Lean 4 not just as a programming language, but as a super-strict math teacher that never lets you make a mistake without proof.

Here is how they did it, using some simple analogies:

1. The "Two-Brain" System (Polymorphism)

Usually, when you write a computer program, you use "floating-point numbers" (like the numbers on a calculator). These are fast but slightly messy because computers can't hold infinite precision. When you do math proofs, you use "real numbers" (perfect, infinite precision), but you can't run those on a computer.

The authors solved this by building a shape-shifting robot.

Brain A (The Proof): When they need to prove the math is right, the robot wears a "Real Number" suit. It does perfect, theoretical math to prove the logic is flawless.
Brain B (The Execution): When they need to run the program on real data, the robot swaps into a "Floating-Point" suit. It runs fast on actual computers.
The Magic: Because the robot is built the same way in both suits, if the "Proof Brain" says the logic is perfect, the "Execution Brain" is guaranteed to follow those same rules. It's like proving a bridge design is safe with perfect math, then building the actual bridge with real steel, knowing the design holds up.

2. The "Recipe vs. The Cooking" (Derivation as Specification)

In normal science, you write a recipe (the math theory) on paper, and then a chef (the programmer) tries to cook it in the kitchen (the software). Sometimes the chef adds a pinch of salt here or there, or misunderstands a step, and the dish tastes different from the recipe.

In LeanBET, the recipe and the cooking happen in the same room. The "math derivation" (the recipe) is written directly into the code. The computer checks that the code is the recipe. If the code says "add salt," the math proof verifies that "adding salt" is exactly what the theory demands. There is no gap between the theory and the practice.

3. The "Strict Inspector" (Formal Verification)

The paper claims that their program doesn't just guess the answer; it carries a certificate of correctness with it.

Standard Software: You run the program, it gives you a number, and you hope it's right.
LeanBET: You run the program, it gives you a number, and it also hands you a mathematically proven document saying, "I checked every step, I followed every rule, and this number is the only correct answer based on the data you gave me."

What Did They Find?

They tested their new "Math-Proofed Robot" against the old "Standard Robot" (BETSI) using 19 different sets of data (like 19 different sponges).

The Result: For 18 out of 19 sponges, the two robots gave the exact same answer down to the tiniest decimal point.
The One Glitch: For one sponge (called UiO-66), there was a tiny difference (0.03%). The authors admit they aren't sure why yet, but it's a very small error compared to the usual noise in experiments.

The Bottom Line

This paper isn't about inventing a new way to measure sponges. It's about building a trustworthy version of the existing way. They took a standard scientific tool, rebuilt it inside a "math-proof" environment, and showed that it works just as well as the old tools but with a guarantee that it hasn't made any logical mistakes.

It's like upgrading from a regular map to a GPS that not only tells you the route but also proves, step-by-step, that the route is the shortest and safest one possible, with no hidden detours.

Technical Summary of "LeanBET: Formally-verified surface area calculations in Lean"

Problem Statement
The Brunauer–Emmett–Teller (BET) method is the standard framework for estimating specific surface area from gas adsorption isotherms. However, practical implementation involves significant subjectivity, particularly in selecting the relative pressure range for linear regression. While consistency criteria (Rouquerol criteria) exist to filter valid regions, multiple pressure intervals often satisfy these rules, leading to variability in reported surface areas across different laboratories and software implementations. Previous efforts to standardize this process, such as the BET Surface Identification (BETSI) algorithm, rely on general-purpose numerical software (Python). While BETSI reduces subjectivity through exhaustive enumeration, it remains a numerical algorithm where hidden implementation assumptions or subtle errors cannot be formally ruled out, leaving the scientific guarantees of the output unverified.

Methodology
The authors present LeanBET, a fully executable and formally verified BET analysis pipeline implemented in the Lean 4 theorem prover. The approach integrates executable code with mathematical proofs within a single environment, utilizing a polymorphic numeric design to bridge the gap between computation and verification:

Polymorphism: The algorithm is written using a generic type α constrained by a BETLike typeclass. This allows the same code structure to be instantiated over floating-point numbers (Float) for efficient execution on real experimental data and over real numbers (Real) for non-computable mathematical proofs.
Derivation as Specification: The algebraic derivation of the BET linearized equation is not merely a motivation but serves as the formal specification. The code is constructed to mirror this derivation, ensuring the executable transformation matches the theoretical model.
Workflow: The pipeline performs the following steps, each paired with a formal proof:
1. Window Enumeration: Systematically generates all contiguous sublists of the isotherm containing at least two data points.
2. Linearization: Transforms data points $(p, n)$ into the linearized BET form $p/[n(1-p)]$ .
3. Linear Regression: Performs least-squares regression on each window to extract slope, intercept, and $R^2$ .
4. Admissibility Checks: Filters candidates based on Rouquerol criteria (monotonicity of $n(1-p)$ , positivity of parameters $C$ and $n_m$ , and consistency of monolayer pressure).
5. Knee Selection: Resolves ties among valid windows by selecting the one with the largest end index, breaking further ties with the smallest percentage error.

Key Contributions

Formal Verification of the BETSI Pipeline: The paper provides the first machine-checked implementation of the full BETSI workflow. It proves that the executable code satisfies the mathematical definitions of the BET theory and the specific selection criteria.
Soundness and Completeness Proofs: The authors formally prove:
- Theorem A.1 & A.2: The BET isotherm equation is derived from layer model assumptions, and the linearization step is algebraically correct.
- Theorem A.3: The window enumeration is sound (only valid windows are generated) and complete (no valid contiguous window is missed).
- Theorem A.4 & A.5: The regression step returns a true least-squares minimizer, and the extracted physical parameters ( $n_m$ , $C$ ) agree with their theoretical definitions.
- Theorem A.6 & A.7: The admissibility checks and the knee-based selection strategy strictly satisfy their formal specifications.
Polymorphic Implementation: The work demonstrates a practical pattern for scientific computing in Lean, where correctness is proven over idealized real numbers while execution occurs over floating-point arithmetic, avoiding the need for separate verification languages.

Results
The LeanBET implementation was evaluated against the reference Python-based BETSI implementation using a benchmark set of 19 adsorption isotherms.

Numerical Agreement: LeanBET agrees with the BETSI reference to machine precision for 18 of the 19 isotherms.
Discrepancy: For the UiO-66 dataset, a deviation of approximately 0.03% (0.36 m²/g) was observed. The authors note that the cause of this specific deviation remains undetermined but emphasize that it is significantly smaller than typical experimental uncertainties.
Verification Status: All proofs compiled successfully, providing machine-checked guarantees that the algorithm's output adheres to the specified mathematical constraints.

Significance
The paper claims that LeanBET demonstrates that a practical scientific computing workflow can be built in a theorem prover without sacrificing numerical agreement with established reference implementations. The primary contribution is not a new algorithm, but a formally verified guarantee of correctness. By linking the executable pipeline to machine-checked mathematical statements, the work addresses the reproducibility crisis in BET analysis by eliminating ambiguity regarding the implementation's adherence to the underlying theory and selection criteria. The authors position this as a step toward bridging the gap between theoretical derivations and software implementations in materials science.

LeanBET: Formally-verified surface area calculations in Lean