DreamCAD: Scaling Multi-modal CAD Generation using Differentiable Parametric Surfaces

Imagine you want to build a custom piece of furniture, like a chair. In the old days, you'd have to draw every single screw, curve, and joint by hand on a blueprint. That's what CAD (Computer-Aided Design) is: the digital blueprint used by engineers to build everything from cars to smartphones.

For a long time, teaching computers to draw these blueprints automatically has been incredibly hard. It's like trying to teach a robot to write a novel, but the robot only knows how to speak in "binary code" (0s and 1s) and gets confused if you ask it to write about something it hasn't seen before.

Here is the story of DreamCAD, a new system that changes the game, explained simply.

The Problem: The "Too Hard" Puzzle

Existing AI methods for creating CAD models face two big walls:

The "Recipe" Wall: Some AIs try to learn CAD by memorizing the "recipe" (the history of how a human drew it: "draw a circle, then pull it up"). But this only works for simple things. If you ask for a weird, organic shape, the AI gets lost because it doesn't have a recipe for it.
The "Data" Wall: There are millions of 3D shapes (like meshes) floating around the internet, but they don't have the "blueprint" labels (CAD data) that engineers need. Existing AIs can't use these millions of shapes because they are looking for a specific type of label that doesn't exist.

The Solution: DreamCAD's "Clay" Approach

Instead of trying to force the AI to learn the complex "recipe" or the strict "blueprint" immediately, the researchers invented a new way to think about shapes.

The Analogy: The Sculptor vs. The Architect

Old AIs (The Architect): Try to build a house by laying every brick in a specific order. If one brick is wrong, the whole house collapses. They need a perfect plan (CAD history) to start.
DreamCAD (The Sculptor): Starts with a big lump of clay. It doesn't care about the bricks yet. It just shapes the clay into the general form of a chair, a gear, or a phone. Once the shape looks right, then it figures out the blueprint.

How DreamCAD Works (The Magic Steps)

1. The "Smooth Clay" (Parametric Surfaces)
DreamCAD doesn't build shapes out of jagged triangles (like video game graphics). Instead, it builds them out of mathematical curves (called Bézier patches).

Think of it like this: Instead of building a wall with rough, jagged stones, DreamCAD uses smooth, flexible sheets of rubber. These sheets can be stretched and pulled to match any shape perfectly. Because they are mathematical, the computer can "feel" the surface and adjust it smoothly.

2. Learning from the "Unlabeled" Crowd
Because DreamCAD uses these smooth sheets, it can learn from any 3D shape, even ones without CAD blueprints.

The Analogy: Imagine you want to learn how to draw a horse. You don't need a textbook on horse anatomy. You can just look at a million photos of horses (the "unlabeled" data) and learn what a horse looks like. DreamCAD does this with 3D shapes. It looks at millions of 3D models, learns the "feel" of the curves, and learns to recreate them using its smooth rubber sheets.

3. The "Dream" Part (Multimodal Generation)
DreamCAD is a "dreamer" because you can wake it up with three different types of clues:

Text: "Draw me a red chair with four legs."
Image: Show it a photo of a weird gear.
Point Cloud: Show it a cloud of dots (like a 3D scan).
It takes these clues and "dreams" up the smooth rubber-sheet shape that matches your request.

4. The "Magic Translator" (CADCap-1M)
To teach the AI how to understand text descriptions, the researchers created a massive library called CADCap-1M.

The Analogy: They took 1 million 3D shapes and hired a super-smart AI (GPT-5) to write a short, accurate story about each one. "This is a gear with 16 teeth and a hole in the middle." This gave the system a huge vocabulary to understand what humans are asking for.

Why This Matters

The result is a system that can:

Understand anything: It can create complex, weird shapes that old AIs couldn't touch.
Be precise: Even though it learns from "rough" data, the final output is a perfect, smooth mathematical curve that engineers can use immediately.
Be editable: You can take the result and tweak the curves, just like a sculptor adding a little more clay to a nose.

The Final Step: From "Clay" to "Blueprint"

The paper admits that DreamCAD creates the shape perfectly, but it doesn't automatically generate the final "construction manual" (the complex topology) in one go.

The Analogy: DreamCAD is an amazing sculptor who can make a perfect clay statue of a car. But to actually manufacture the car, you need the factory blueprints. The researchers show that because the clay statue is so perfect, a computer can easily look at it and write the blueprints afterward.

In short: DreamCAD is like a master sculptor who can turn a vague description or a rough sketch into a perfect, smooth, mathematical 3D model, opening the door for AI to help engineers design the future faster than ever before.

1. Problem Statement

Multimodal CAD generation (creating editable CAD models from text, images, or point clouds) faces a fundamental scalability bottleneck:

Design-History Methods: Rely on small, annotated datasets (e.g., DeepCAD-160k) containing sketch-and-extrude sequences. They struggle to generalize to freeform or open-vocabulary shapes.
BRep Topology Methods: Boundary Representation (BRep) is discrete and non-differentiable, making it incompatible with gradient-based deep learning at scale. Existing methods requiring explicit BRep annotations cannot leverage the millions of unannotated 3D meshes available in datasets like ABC or Objaverse.
Data Scarcity: Large-scale CAD datasets (e.g., ABC-1M) lack textual or visual descriptions, hindering multimodal training.

The core challenge is to generate editable, parametric CAD surfaces from large-scale, unstructured 3D data without relying on expensive CAD-specific annotations (like design histories or BRep topology labels).

2. Methodology: DreamCAD

The authors propose DreamCAD, a multimodal generative framework that bridges the gap between unstructured 3D meshes and editable CAD models using a decoupled, two-stage pipeline.

A. Representation: Differentiable Parametric Surfaces

Instead of generating discrete BRep topology directly, DreamCAD represents shapes as a set of $C^0$ -continuous rational B´ezier patches.

Differentiable Tessellation: The B´ezier patches are converted into meshes via differentiable tessellation. This allows the model to be trained using standard point-cloud supervision (Chamfer Distance) on large-scale mesh data without needing CAD ground truth.
$C^0$ Continuity Enforcement: To ensure valid CAD models (no gaps or overlaps), the method uses a structural approach rather than geometric optimization. It starts with a sparse voxel grid, removes internal quads via a flood-fill algorithm, and maps surface quads to parametric patches. Adjacent patches share boundary control points, guaranteeing continuity.
Output: The final output is a set of control points and weights defining rational B´ezier surfaces, which can be exported as STEP files and edited in standard CAD software.

B. Architecture

The framework consists of three main components:

Sparse Voxel VAE:
- Encoder: Takes a 3D mesh, voxelize it, and enriches each active voxel with local visual features (DINOv2 embeddings from 150 multi-view renders), normals, and SDF values. It encodes these into structured latents.
- Decoder: Reconstructs the shape by predicting local deformations and weight updates for an initial set of B´ezier patches derived from the voxel grid. It employs regularizers (G1 continuity, Laplacian smoothing) to ensure smooth, spike-free surfaces.
Conditional Generation (Coarse-to-Fine):
- Uses a Flow Matching framework.
- Stage 1 (Coarse): Generates a low-resolution sparse voxel grid from the input condition (text, image, or point cloud).
- Stage 2 (Fine): Refines the voxel grid into structured latents, which are decoded into the final high-fidelity parametric B´ezier surfaces.
- Text-to-CAD Strategy: To overcome the lack of spatial cues in text, the system uses a two-stage approach: Text $\to$ Image (fine-tuned Stable Diffusion) $\to$ Image $\to$ CAD.

C. Dataset: CADCap-1M

To enable text-to-CAD training at scale, the authors introduce CADCap-1M, the largest CAD captioning dataset to date.

Scale: Contains over 1 million high-quality text descriptions for CAD models from 10 public datasets (ABC, Automate, Fusion360, etc.).
Generation: Descriptions are generated using GPT-5.
Metadata Augmentation: Prompts are augmented with metadata extracted from original CAD files (e.g., hole counts, part names, aspect ratios) to reduce hallucinations and improve geometric accuracy.

3. Key Contributions

DreamCAD Framework: A multimodal generative model trained only on point-level supervision (meshes) without any CAD-specific annotations (design history or BRep topology). It produces editable parametric surfaces.
CADCap-1M Dataset: Release of a 1M+ sample dataset with GPT-5 generated captions, significantly advancing text-to-CAD research by providing the necessary scale and diversity.
Differentiable Parametric Surfaces: A novel method to enforce $C^0$ continuity structurally, enabling the training of B´ezier patch generators on massive unstructured datasets.
Topology Recovery Pathway: Demonstrates that the high-fidelity geometric foundation generated by DreamCAD can serve as a strong prior for recovering full CAD topology (converting patches to NURBS/BRep) in a subsequent step.

4. Experimental Results

DreamCAD was evaluated on ABC and Objaverse datasets across three modalities: Point-to-CAD, Image-to-CAD, and Text-to-CAD.

Geometric Accuracy:
- Point-to-CAD: Outperforms baselines (DeepCAD, CAD-Recode, Cadrille) by reducing Chamfer Distance (CD) by up to 75% and achieving a 0% invalidity ratio (all outputs are valid CAD models).
- Image-to-CAD: Achieves 77% user preference and 76% GPT-5 preference, significantly outperforming BRepDiff and Cadrille.
- Text-to-CAD: Achieves 85% user preference and 85% GPT-5 preference. It successfully reconstructs complex shapes and numerically constrained features (e.g., specific hole counts) that design-history models fail to capture.
Generalization: The model shows strong out-of-distribution (OOD) performance on Objaverse, handling free-form and organic shapes better than previous methods.
Topology Recovery: A follow-up experiment converting DreamCAD's patch outputs to NURBS representations using a fine-tuned LLM (Qwen3) achieved 99.2% valid CAD models, proving the geometric foundation is production-ready.

5. Significance

Scalability: DreamCAD breaks the dependency on small, annotated CAD datasets, unlocking the potential of millions of unannotated 3D meshes for generative AI.
Editability: Unlike standard text-to-3D methods that produce meshes, DreamCAD outputs STEP files with control points and weights, making the results directly usable in industrial engineering workflows.
Paradigm Shift: It proposes a shift from "joint geometry and topology generation" (which is hard to scale) to a "decoupled pipeline" (scalable geometry first, topology recovery second).
Industrial Impact: By enabling the generation of manufacturable, editable CAD models from natural language or single images, DreamCAD accelerates design workflows, rapid prototyping, and reverse engineering.

In summary, DreamCAD represents a major step forward in making AI-generated CAD models both scalable (via large unstructured data) and practical (via editable parametric outputs).