Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

The paper introduces InternGeometry, an LLM agent enhanced by Complexity-Boosting Reinforcement Learning and a dynamic memory mechanism that iteratively proposes and verifies auxiliary constructions, achieving a medalist-level performance on IMO geometry problems with significantly less training data than previous expert models.

Haiteng Zhao, Junhao Shen, Yiming Zhang, Songyang Gao, Kuikun Liu, Tianyou Ma, Fan Zheng, Dahua Lin, Wenwei Zhang, Kai Chen

Published 2026-03-06
📖 4 min read☕ Coffee break read

Imagine you are trying to solve a incredibly difficult puzzle, like the ones found in the world's hardest math competitions (the International Mathematical Olympiad, or IMO). For years, computers have been terrible at the specific type of puzzle called Geometry.

Why? Because geometry isn't just about crunching numbers. It's about creativity. To solve a hard geometry problem, you often have to draw an invisible line, add a hidden point, or create a new shape that isn't in the original picture. This is called an "auxiliary construction."

Old computer programs were like rigid robots: they could follow rules perfectly, but they couldn't "guess" what to draw next. They needed millions of examples to learn, and even then, they often got stuck.

Enter InternGeometry, a new AI agent that acts more like a brilliant human student than a robot. Here is how it works, explained through simple analogies:

1. The "Thinker" vs. The "Calculator"

Most old AI systems were like calculators: they tried to brute-force every possible answer until they got lucky.

InternGeometry is like a detective.

  • The Detective's Process: Instead of just guessing, the detective (the AI) looks at the crime scene (the geometry problem) and says, "Hmm, if I draw a line here, maybe I can prove this angle is 90 degrees."
  • The Lab: It then goes to a "lab" (a symbolic engine) to test that idea.
  • The Feedback: If the lab says, "No, that line doesn't work," the detective doesn't give up. They say, "Okay, that failed. Let me try a different angle."
  • The Memory: The detective keeps a notebook (Dynamic Memory) so they don't forget which ideas failed and which ones showed promise, even after 200 tries.

2. The "Video Game Level" Training (CBRL)

How do you train a detective to solve the hardest cases? You don't start them on a murder mystery if they can't even solve a missing sock.

The researchers used a method called Complexity-Boosting Reinforcement Learning (CBRL). Think of this as a video game with a smart difficulty slider:

  • Level 1: The AI is given easy puzzles. It solves them and gets a "thumbs up."
  • Level 2: Because it's getting good, the game automatically makes the next puzzle slightly harder.
  • The Sweet Spot: The system constantly adjusts the difficulty to be "just right"—not so easy that it's boring, and not so hard that the AI gives up.
  • The Result: By the time the AI reaches the "Boss Level" (IMO problems), it has been trained on a perfect curriculum that slowly built its skills, rather than throwing it into the deep end immediately.

3. The "Magic Trick" of Efficiency

Here is the most surprising part:

  • The Old Way (AlphaGeometry 2): To learn, this system needed 300 million examples. It was like trying to learn to cook by reading every recipe book in the world.
  • The New Way (InternGeometry): This system learned with only 13,000 examples. That is 0.004% of the data the old system used.

It's the difference between a student who memorizes a dictionary to learn a language versus a student who actually talks to people, makes mistakes, learns from them, and improves rapidly.

4. The Result: Beating the Gold Medalists

When they tested InternGeometry on the last 25 years of the world's hardest geometry problems:

  • It solved 44 out of 50 problems.
  • The average score of a human Gold Medalist (the top 0.1% of math students) is about 40.9.
  • InternGeometry beat the average Gold Medalist.

Even cooler? In some cases, the AI came up with a solution that no human had ever thought of. It invented a new way to draw the lines that was more elegant than the human solution.

Summary

InternGeometry is a new AI that doesn't just "calculate" geometry; it reasons like a human.

  • It thinks out loud (proposing ideas).
  • It tests them (using a math engine).
  • It remembers its mistakes (dynamic memory).
  • It learns by playing a game that gets harder as it gets smarter (Complexity-Boosting RL).

It proves that with the right training method, AI doesn't need massive amounts of data to become a genius; it just needs the right way to practice.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →