Discovering New Theorems via LLMs with In-Context Proof Learning in Lean

This contribution introduces the Conjecture-Proof Loop (CPL), a pipeline that leverages context-based learning for a large language model by iteratively feeding it its own formally verified theorems and proofs in Lean 4, thereby significantly increasing the discovery rate and success of novel, hard-to-prove mathematical conjectures.

Original authors: Kazumi Kasaura, Naoto Onda, Yuta Oriike, Masaya Taniguchi, Akiyoshi Sannai, Sho Sonoda

Published 2026-05-07
📖 4 min read☕ Coffee break read

Original authors: Kazumi Kasaura, Naoto Onda, Yuta Oriike, Masaya Taniguchi, Akiyoshi Sannai, Sho Sonoda

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to teach a very intelligent but slightly forgetful robot to solve complex mathematical puzzles. The robot is a Large Language Model (LLM), and the puzzles are formal mathematical proofs written in a strict computer language called Lean.

This work introduces a new method for teaching this robot, called the Conjecturing-Proving Loop (CPL). Here is how it works, explained through simple analogies:

The Problem: The "Guess-and-Check" Trap

Normally, when people try to teach AI mathematics, they ask it to guess a puzzle and solve it immediately.

  • The Analogy: Imagine asking a student to "write a math problem and solve it right away."
  • The Problem: The student becomes lazy. They write easy problems (like "2 + 2 = 4") because these are easy to solve. They avoid difficult problems because they know they might fail. In the end, the AI generates thousands of simple, boring proofs and misses the difficult, interesting ones.

The Solution: The "Two-Step Dance" (CPL)

The authors split the process into two distinct roles: a Conjecturer (the idea generator) and a Prover (the solver).

  1. The Conjecturer (The Architect): This part of the AI looks at a library of existing mathematical rules and develops new ideas (conjectures). It does not try to solve them yet; it simply writes them down.
  2. The Prover (The Builder): This part takes the ideas and attempts to create a proof for them. If it fails, it tries again. It keeps trying until it either succeeds or runs out of attempts.
  3. The Library (The Memory): Every time the Prover successfully creates a proof, that proof is added to the library.

The Magical Ingredient: Context Learning
Here comes the clever part: The Prover does not just look at the original mathematical rules. It looks at the library of proofs it has already successfully created during the current session.

  • The Analogy: Imagine a student taking an exam. In the old way, they had to rely only on what they memorized before the exam started. In this new way, every time the student correctly solves a problem, they are allowed to read their own solution before tackling the next problem. They learn the "tricks" and "strategies" from their own recent successes.

What They Found

The researchers tested this on some tricky topological concepts (a branch of mathematics dealing with shapes and spaces) that the AI did not yet master well.

  • Quantity vs. Quality: The old method (simultaneous guessing and solving) generated more theorems overall, but they were mostly short and simple. The new method (CPL) generated fewer theorems overall, but they were much more difficult and longer.
  • The Big Win: The new method successfully discovered a specific, difficult theorem about "alpha-open sets" that the old method never found, even after 20 attempts.
  • Learning from Success: When the AI received its own library of previous proofs as a "cheat sheet" (context), it could prove difficult theorems that it could not solve without this context. Even if the AI could not prove the theorem in plain English, it could prove it in Lean code once it had seen similar successful proofs.

The Conclusion

The work claims that by separating "idea generation" from "proof solving" and by letting the AI learn from its own verified successes in real time, we can enable it to discover more difficult, complex mathematical truths that it would otherwise miss. It is like giving the AI a head start by allowing it to study its own homework before taking the final exam.

Note: The work focuses strictly on this method for generating and verifying mathematical theorems. It does not claim that this method works for medical diagnoses, financial forecasts, or other real-world applications outside of formal mathematics.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →