Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek

Imagine you want to build a complex piece of furniture, like a custom bookshelf with curved edges and hidden compartments. In the old days, you'd have to hire a master carpenter (a human designer) to draw every single cut and screw location by hand. This is slow and expensive.

Recently, we tried teaching computers to do this by feeding them millions of blueprints and asking them to "learn" the rules. But this is like trying to teach a dog to do calculus by making it memorize a textbook; it's hard, expensive, and the dog might get confused.

Seek-CAD is a new, smarter way to do this. Instead of forcing the computer to memorize a textbook, we give it a super-smart, reasoning AI (called DeepSeek-R1) and let it "think" its way through the design, checking its own work as it goes.

Here is how it works, broken down into simple analogies:

1. The "Architect" and the "Builder"

Think of the AI (DeepSeek-R1) as a brilliant Architect.

The Problem: Usually, if you ask an AI to design a 3D object, it might hallucinate (make things up) or give you a blueprint that looks good on paper but falls apart when you try to build it.
The Solution: Seek-CAD doesn't just ask the Architect to "draw a picture." It asks the Architect to write a step-by-step construction script (like a recipe).
The Trick: The Architect is given a "Cheat Sheet" (a local database of real-world parts) and strict rules (Knowledge Constraints) so it doesn't invent impossible physics.

2. The "Stop-Motion" Check (The Secret Sauce)

This is the most creative part of the paper.

The Old Way: You ask the AI to build a chair. It gives you the code. You run the code, and boom—the chair has legs sticking out of the seat. You have to start over.
The Seek-CAD Way: The AI builds the chair one step at a time, and after every single step, it takes a photo.
- Step 1: "I'm drawing the outline of the seat." (Photo taken).
- Step 2: "I'm extruding the legs." (Photo taken).
- Step 3: "I'm adding the backrest." (Photo taken).

Then, a second AI (a Vision Expert, called Gemini-2.0) looks at these photos. It doesn't just look at the final chair; it looks at the process. It compares the photos against the Architect's thought process (what the Architect said it was going to do).

The Analogy: Imagine you are teaching a child to bake a cake.
- Old Way: You let them mix everything, put it in the oven, and then say, "Oh no, you forgot the eggs!" The cake is ruined.
- Seek-CAD Way: You watch them crack the eggs. You say, "Good!" You watch them mix. You say, "Good!" You watch them pour the batter. If they forget the sugar, you stop them before it goes in the oven and say, "Wait, you didn't add sugar in step 2!"

3. The "SSR" Language (The New Blueprint)

To make this work, the researchers invented a new way to describe 3D objects called SSR (Sketch, Sketch-based feature, Refinements).

The Old Language (SE): Most CAD systems only know two words: "Draw a shape" and "Pull it up." This is like only knowing how to build with Lego bricks. You can't make smooth curves or rounded edges easily.
The New Language (SSR): This is like giving the AI a full toolbox. It knows how to "Draw a shape," "Pull it up," AND "Round the corners," "Add a groove," or "Hollow it out."
The "CapType" Reference: Sometimes, you need to round a corner that was created by two shapes joining together. The AI needs a way to point to that specific corner without getting lost. The paper introduces a "Name Tag" system (CapType) that says, "Hey, that specific edge right there? That's the one we need to round."

4. The Result: A Self-Correcting Loop

The system works in a loop:

Draft: The Architect writes the code.
Visualize: The computer renders the steps into images.
Review: The Vision Expert checks the images against the Architect's thoughts.
Feedback: If the Vision Expert sees a mistake (e.g., "You said you were making a hole, but the photo shows a bump"), it tells the Architect.
Refine: The Architect rewrites the code to fix the mistake.

Why is this a big deal?

No Training Required: Most AI models need to be "trained" for months on supercomputers to learn a specific job. Seek-CAD uses a pre-trained, reasoning AI and just gives it a new job description. It's like hiring a genius who already knows how to think, rather than training a dog from scratch.
Industrial Ready: Because it uses the "SSR" language, it can build complex, real-world industrial parts (like car engine blocks or phone casings) that previous AI models couldn't handle.
It Thinks: By using "Chain of Thought" (CoT), the AI explains why it is doing each step, making it much easier to catch errors before they happen.

In summary: Seek-CAD is like giving a computer a brilliant architect, a camera crew to film the construction process, and a strict inspector to check the work at every stage. The result is a computer that can design complex 3D objects on its own, fixing its own mistakes before they become disasters.

1. Problem Statement

The field of Computer-Aided Design (CAD) generative modeling aims to automate the creation of 3D parametric models, which are crucial for industrial manufacturing. While Large Language Models (LLMs) and Vision-Language Models (VLMs) show promise, existing approaches face significant limitations:

Training Dependency: Most state-of-the-art methods require fine-tuning on specific datasets, which is computationally expensive and reduces flexibility.
Lack of Reasoning: Training-free approaches often lack a mechanism to harness Chain-of-Thought (CoT) reasoning, limiting their ability to handle complex design logic.
Paradigm Constraints: Existing datasets and models predominantly rely on the Sketch-Extrude (SE) paradigm, which supports only basic operations. This fails to capture the complexity of real-world industrial designs that require features like fillets, chamfers, shells, and boolean operations.
Feedback Limitations: Current self-refinement methods often evaluate only the final rendered image against a text description, ignoring intermediate construction steps and the logical reasoning behind the design.

2. Methodology: The Seek-CAD Framework

Seek-CAD is a training-free generative framework that leverages a locally deployed, open-source reasoning LLM (DeepSeek-R1-32B-Q4) combined with a VLM (Gemini-2.0) for visual feedback. The framework operates through two main stages:

A. Local Inference Pipeline

Instead of fine-tuning, Seek-CAD uses Retrieval-Augmented Generation (RAG) on a local CAD code corpus (10,000 models) to guide the LLM.

Knowledge Constraint: A system prompt defines the SSR (Sketch, Sketch-based feature, Refinements) paradigm and provides documentation/examples to prevent hallucinations and ensure adherence to the specific coding syntax.
RAG Strategy: For a given text query, the system retrieves the top-3 similar CAD code/description pairs using a hybrid search (vector + full-text) to augment the context.
Initial Generation: DeepSeek-R1 generates an initial CAD code ( $I_0$ ) and a corresponding Chain-of-Thought (CoT) explaining the design logic step-by-step.
Syntax Correction: An automated pattern template fixes common syntax errors (e.g., mismatched parentheses) before rendering.

B. Step-wise Visual Feedback (SVF) & Self-Refinement

This is the core innovation for iterative improvement.

Step-wise Rendering: The initial code $I_0$ $I_{0}$ is rendered into a sequence of perspective images. Crucially, this includes:
- Intermediate Shapes ( $M_I$ ): Visualizing the object at each step of the construction sequence (highlighting the current entity while hiding previous ones to avoid occlusion).
- Ultimate Shape ( $M_U$ ): The final rendered object.
VLM Evaluation: The sequence of images ( $M$ $M$ ) and the CoT from DeepSeek-R1 are fed into Gemini-2.0. The VLM is prompted to judge the alignment between the design logic (CoT) and the visual evidence (images).
- Why CoT? Using the CoT helps the VLM understand the intent and construction process, not just the final shape.
Refinement Loop: If the VLM detects a misalignment (e.g., "The hole is missing"), it provides specific feedback. DeepSeek-R1 uses this feedback to refine the code ( $I_k$ ). This process iterates (up to $k=2$ times) until the feedback is positive or the max iteration is reached.

C. The SSR Design Paradigm

The paper introduces a new modeling paradigm to replace the limited SE approach:

SSR Triplet: Each modeling step is defined as $S = (s, f, \langle r_1, \dots, r_k \rangle)$ $S = (s, f, ⟨ r_{1}, \dots, r_{k} ⟩)$ , where:
- $s$ : A 2D Sketch.
- $f$ : A Sketch-based feature (e.g., Extrude, Revolve).
- $\langle r \rangle$ : Optional Refinement features (e.g., Fillet, Chamfer, Shell).
CapType Reference Mechanism: To handle refinements that depend on topological primitives generated during modeling (which are not in the original sketch), the authors introduce CapType. This maps sketch primitives to resulting 3D primitives (Start, End, or Swept) allowing precise referencing for operations like filleting an edge created by an extrusion.

3. Key Contributions

Seek-CAD Framework: A novel, training-free generative framework using locally deployed DeepSeek-R1. It integrates a self-refinement mechanism driven by sequential visual feedback and Chain-of-Thought reasoning.
SSR Paradigm & CapType: The proposal of the SSR (Sketch, Sketch-based feature, Refinements) design paradigm, which supports complex industrial features. The accompanying CapType mechanism solves the problem of referencing intermediate topological primitives for refinement operations.
New Dataset: The creation and release of a 40k-sample CAD dataset based on the SSR paradigm, covering diverse commands (fillet, chamfer, shell) and paired with GPT-4o generated descriptions.
Step-wise Visual Feedback: A strategy that evaluates intermediate construction steps alongside the final shape, significantly improving the VLM's ability to provide accurate logical corrections.

4. Experimental Results

The authors evaluated Seek-CAD on a test set of 500 CAD models (distinct from the training corpus) and compared it against fine-tuned models (CAD-Llama) and other training-free methods (3D-PreMise, CADCodeVerify).

Geometric Accuracy: Seek-CAD achieved superior performance in Chamfer Distance (CD), Hausdorff Distance (HD), and Intersection over Ground Truth (IoGT) compared to both fine-tuned and other training-free baselines.
- Example: IoGT score of 0.7226 (Seek-CAD) vs. 0.7023 (CAD-Llama) and 0.6315 (3D-PreMise).
Text-Image Alignment: It achieved the highest G-Score (3.5185), indicating better semantic alignment between the text description and the generated model.
Novelty: While slightly lower than the fine-tuned CAD-Llama, Seek-CAD maintained a high novelty score (64.04%), proving it does not merely memorize the corpus.
Refinement Impact: Ablation studies showed that removing the step-wise images or the CoT significantly degraded performance, validating the necessity of the SVF strategy.
Robustness: The framework performed well even when tested on the simpler DeepCAD dataset (SE paradigm), demonstrating generalizability.

5. Significance

Efficiency & Accessibility: By eliminating the need for fine-tuning and utilizing local inference, Seek-CAD offers a highly efficient and accessible solution for generating industrial-grade CAD models, reducing computational barriers.
Bridging Logic and Geometry: The integration of CoT with visual feedback creates a robust loop where the model "thinks" about the design logic before and during the visual verification, leading to higher geometric fidelity.
Industrial Relevance: The shift from the simple SE paradigm to the complex SSR paradigm, supported by the CapType mechanism, brings generative AI closer to real-world industrial design requirements, enabling the creation of parts with complex features like fillets and shells.
Open Science: The release of the SSR-based dataset and the open-source nature of the approach (using DeepSeek-R1) encourages further research in training-free CAD generation.

In conclusion, Seek-CAD demonstrates that with the right reasoning capabilities (CoT), visual grounding (Step-wise SVF), and a structured design paradigm (SSR), training-free LLMs can outperform fine-tuned models in generating complex, parametric 3D CAD models.