Imagine you have a very smart robot assistant. This robot has excellent eyes (cameras) and a very chatty brain (a large language model). If you ask it, "Is that cup to the left of the plate?" it can answer perfectly. It's great at qualitative reasoning—understanding relationships like "left," "right," "above," or "near."
But if you ask it, "Move the cup exactly 5 centimeters to the right," the robot gets confused. It might guess, "Maybe 3 centimeters? Or maybe 10?" It lacks the internal calculator to do precise math. It's like a person who can tell you a mountain is "tall" but can't tell you it's exactly 3,452 meters high.
This is the problem TIGeR solves.
The Problem: The "Guessing Game"
Current robots rely on AI models that are great at recognizing patterns but terrible at math. They try to "guess" the answer based on what they've seen before, similar to how a student might guess the answer to a math problem because it looks like one they saw on a test. In the real world, where a robot needs to pick up a fragile egg or pour a drink without spilling, guessing is dangerous. You need centimeter-level precision.
The Solution: TIGeR (The Robot with a Calculator)
The authors created a framework called TIGeR (Tool-Integrated Geometric Reasoning).
Think of TIGeR not as a robot trying to memorize math formulas, but as a smart manager who knows when to call an expert.
- The Manager (The AI Brain): When the robot sees a task like "Pour water from 5cm above the plant," the AI brain doesn't try to calculate the distance itself. Instead, it says, "I know I need to do some geometry here. I need to call the calculator."
- The Experts (The Tools): The AI writes a tiny piece of computer code (like a recipe) and sends it to a specialized "calculator" tool. This tool uses real data from the camera (like depth sensors and lens settings) to do the exact math.
- The Result: The calculator returns a precise number (e.g., "The point is at coordinates X, Y, Z"). The AI then tells the robot arm to move exactly there.
How They Taught the Robot
You can't just tell a robot to "be smart." You have to train it. The researchers built a massive training library called TIGeR-300K.
- The Textbook: Imagine a textbook with 300,000 practice problems. But these aren't just questions and answers. Every problem includes the step-by-step solution, the calculator code used, and the intermediate steps.
- The Training Method:
- Stage 1 (Supervised Learning): They showed the robot the textbook, teaching it, "When you see this type of question, write this specific code to get the answer."
- Stage 2 (Reinforcement Learning): They played a game of "Red Light, Green Light." If the robot wrote code that got the right answer, it got a gold star. If it wrote code that was messy or got the wrong number, it got a gentle correction. They even gave extra points for writing clean, logical code, not just lucky guesses.
What Can It Do Now?
Because TIGeR uses a "calculator" instead of a "guess," it can do things other robots can't:
- The "Back of the Object" Trick: If you ask a normal robot to put a bag "behind" a toy, it might get stuck because it can't see the back of the toy (it's hidden). TIGeR calculates the 3D shape of the toy, figures out where the "back" is in 3D space even if it's invisible, and guides the robot there.
- The "Exact Distance" Trick: It can move an object to be exactly 10cm away from another, not "kind of close."
- The "Multi-View" Trick: If you show it two pictures taken from different angles, it can mathematically combine them to understand the 3D distance between objects, just like a human using two eyes to judge depth, but with math precision.
The Bottom Line
Before TIGeR, robots were like artists who could draw a beautiful picture of a table but couldn't measure the table to build a chair that fits.
With TIGeR, the robot is now like an architect. It still has the artistic vision to understand the scene, but it also carries a tape measure and a calculator. It doesn't just "see" the world; it computes the world, allowing it to perform delicate, precise tasks in the real world with centimeter-level accuracy.