Imagine you are trying to solve a incredibly difficult math puzzle, like the kind found in the world's toughest high school or college competitions. You have two assistants to help you:
- The "Big Thinker" (General AI): This assistant is brilliant at understanding the problem, explaining the logic in plain English, and coming up with a clever strategy. However, they are a bit messy. They might make small calculation errors, skip a step, or accidentally use a rule that doesn't apply. They are great at the idea but bad at the execution.
- The "Strict Auditor" (Formal Prover): This assistant is a robot that speaks a very rigid, computer-readable language (Lean 4). They are perfect. If they say a proof is correct, it is 100% correct. But they are also very narrow-minded. If you give them a problem that is too complex or doesn't look exactly like a textbook example, they get stuck and give up immediately.
For a long time, researchers had to choose between the Big Thinker (who gets the right idea but fails the test) and the Strict Auditor (who passes the test but can't even start the hard problems).
Enter HILBERT.
HILBERT is a new system that acts like a Master Project Manager who knows how to get the best out of both assistants. It bridges the gap between "thinking" and "proving."
How HILBERT Works: The "Lego Tower" Analogy
Imagine you need to build a massive, 100-story Lego tower (the final proof).
- The Old Way: You ask the Strict Auditor to build the whole tower at once. They look at the 100 stories, get overwhelmed, and say, "I can't do this."
- The HILBERT Way: HILBERT breaks the problem down into tiny, manageable pieces.
Here is the step-by-step process HILBERT uses:
1. The Strategy Session (The Big Thinker)
HILBERT first asks the Big Thinker to look at the 100-story tower. The Big Thinker says, "Okay, we can't build this all at once. Let's break it down! We need a foundation, then a 10-story section, then another 10-story section..."
The Big Thinker writes a blueprint (a "proof sketch") in plain English, dividing the huge problem into smaller sub-problems.
2. The Retrieval (The Librarian)
Before trying to build, HILBERT sends a Librarian to the library (Mathlib, a giant database of known math facts). The Librarian finds specific rules and theorems that are perfect for building the foundation or the 10-story sections. This ensures the team isn't reinventing the wheel.
3. The Construction (The Strict Auditor)
Now, HILBERT takes one small section of the blueprint (say, "Build the foundation") and hands it to the Strict Auditor.
- If the Auditor succeeds: Great! That piece is done.
- If the Auditor fails: The Auditor says, "I can't build this specific 10-story section."
4. The Recursive Loop (The "Divide and Conquer" Magic)
This is where HILBERT gets really smart. Instead of giving up, it goes back to the Big Thinker.
- "Hey, the Auditor couldn't build this 10-story section. Can you break that down into even smaller pieces?"
- The Big Thinker breaks the 10-story section into five 2-story sections.
- HILBERT hands these tiny 2-story sections back to the Strict Auditor.
- The Auditor, now dealing with tiny, simple tasks, can easily build them.
This process is recursive. If a 2-story section is still too hard, HILBERT breaks it down again into single bricks. It keeps digging deeper until the problem is so small that the Strict Auditor can solve it instantly.
5. The Assembly
Once all the tiny pieces (bricks, 2-story sections, 10-story sections) are built and verified as perfect, HILBERT snaps them all together. Because every single piece was checked by the Strict Auditor, the final 100-story tower is guaranteed to be mathematically perfect.
Why This is a Big Deal
- It solves the "Hard Problem" gap: Before HILBERT, the best computer programs could solve about 13% of the hardest math competition problems (Putnam). HILBERT solved 70%. It even beat some of the most expensive, secret AI systems used by big tech companies.
- It's efficient: By breaking problems down, it doesn't waste energy trying to force the Strict Auditor to do impossible tasks. It uses the Big Thinker to simplify the task first.
- It's self-correcting: If the Big Thinker makes a mistake in the blueprint, the Strict Auditor catches it immediately, and HILBERT asks the Big Thinker to fix the plan before moving on.
The Bottom Line
HILBERT is like a construction crew where the Architect (Big Thinker) designs the plan and breaks it down, and the Mason (Strict Auditor) lays every single brick with perfect precision. By working together in a loop, they can build structures that neither could build alone.
The result? A system that can generate mathematically perfect proofs for problems that were previously considered too difficult for computers to solve on their own.
Get papers like this in your inbox
Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.