cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning

The paper introduces cadrille, a multi-modal CAD reconstruction model that leverages vision-language models and a two-stage training pipeline of supervised fine-tuning followed by reinforcement learning to achieve state-of-the-art performance across diverse input modalities and real-world datasets.

Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna Vorontsova, Anton Konushin, Vladislav Kurenkov, Danila Rukhovich

Published 2026-02-18
📖 4 min read☕ Coffee break read

Imagine you have a physical object in your hand—a weirdly shaped coffee mug, a custom car part, or a piece of jewelry. Now, imagine you want to turn that physical object into a digital blueprint that engineers can edit, 3D print, or manufacture. This process is called CAD Reconstruction.

For a long time, doing this was like trying to translate a book written in a dead language without a dictionary. You needed expensive scanners, specialized skills, and the computer often got the details wrong.

Enter Cadrille (pronounced cad-ree-lee), a new AI model introduced in this paper that acts like a "Universal Translator" for the physical world, turning real-world objects into editable digital designs.

Here is the simple breakdown of how it works, using some everyday analogies.

1. The Problem: The "One-Trick Pony"

Before Cadrille, most AI models were like specialized chefs.

  • One chef could only cook if you gave them a pile of raw ingredients (a Point Cloud from a 3D scanner).
  • Another chef could only cook if you gave them a photo (an Image).
  • A third chef could only cook if you gave them a written recipe (a Text Description).

If you didn't have the exact ingredient they needed, they couldn't help you. Furthermore, even when they did cook, the food (the digital model) often came out broken or inedible (invalid code).

2. The Solution: The "Master Chef" (Cadrille)

Cadrille is different. It's a multimodal Master Chef.

  • It doesn't care if you hand it a 3D scan, a photo, or a text description like "a red cylinder with a hole in the middle."
  • It understands all three languages at once.
  • Instead of just guessing a shape, it writes Python code (specifically using a library called CadQuery). Think of this as the chef not just serving you a dish, but handing you the exact recipe so you can tweak the salt or change the shape later.

3. How It Learned: The "Apprentice" and the "Coach"

The paper describes a two-step training process, which is like training a new employee:

Step 1: The Internship (Supervised Fine-Tuning)
First, the AI is fed a massive library of millions of synthetic CAD models. It's like an apprentice watching thousands of hours of master craftsmen at work. It learns the rules: "If I see a circle here, I should write a command to draw a circle there."

  • The Catch: The apprentice is great at following rules but gets confused when the real world gets messy. If the data is slightly different from the training books, the apprentice freezes or makes mistakes.

Step 2: The Coaching Session (Reinforcement Learning)
This is the paper's big innovation. Instead of just memorizing more books, the AI is put in a "gym" where it tries to build models and gets instant feedback.

  • Imagine the AI tries to build a chair.
  • If the chair falls over (the code is invalid), the "Coach" (the computer program) gives it a harsh penalty: "No points! Try again."
  • If the chair stands up perfectly, it gets a reward.
  • Crucially, the AI learns from its mistakes in real-time. It figures out how to handle messy, real-world data (like a scan with noise or missing parts) that it never saw in the training books.

4. Why This is a Big Deal

The authors tested Cadrille on 10 different benchmarks (like final exams).

  • Versatility: It beat the best "specialized chefs" in every category, whether you gave it a photo, a scan, or text.
  • Reliability: Previous models often produced "broken" code that wouldn't run. Cadrille's "Coaching" phase made it incredibly reliable, almost never producing broken code.
  • Real-World Ready: They tested it on real-world scans (which are usually messy and imperfect). Cadrille handled them like a pro, whereas other models struggled.

The Analogy Summary

  • Old AI: A student who memorized a textbook perfectly but fails the test if the question is phrased slightly differently.
  • Cadrille: A student who memorized the textbook, and then spent months taking practice tests, failing, getting corrected, and learning exactly how to handle curveballs.

The Bottom Line

Cadrille is a breakthrough because it bridges the gap between the messy, imperfect real world and the precise, mathematical world of engineering. By using a "learn from mistakes" approach (Reinforcement Learning), it creates digital blueprints that are not only accurate but also editable, making high-tech design accessible to anyone with a camera or a scanner.

In short: It turns "I have this object" into "Here is the editable code for that object," no matter how you describe or scan it.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →