PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting

Imagine you have a magical, invisible clay model of a complex object, like a desk lamp with a swiveling head, a drawer that slides out, and a laptop stand that folds. Now, imagine you want to teach a computer to understand exactly how every single piece of that object moves, without you ever having to tell it "this is the lamp head" or "this is the drawer."

That's exactly what the paper PD2GS is trying to solve.

Here is the breakdown of their solution using simple analogies:

The Problem: The "Frozen Frame" Confusion

Previous methods for teaching computers about moving objects were like taking two photos of a door: one closed and one open. The computer tries to guess how the door moved between those two photos.

The Flaw: If the object is complex (like a filing cabinet with three drawers), the computer gets confused. It might think the whole cabinet is one giant blob that stretches and squishes, rather than three separate drawers sliding out. It creates a "drift," where the computer's mental model gets messy and blurry over time.

The Solution: The "Master Mold" and the "Magic Remote"

The authors created a new system called PD2GS. Think of it like this:

The Master Mold (The Canonical Field):
Instead of trying to build a new model for every position, the computer first builds one perfect, "standard" 3D model of the object in its resting state. Imagine this as a Master Mold made of millions of tiny, glowing, fuzzy balls (called Gaussians). These balls hold the shape, color, and texture of the object.
The Magic Remote (Latent Codes):
The system learns a special "remote control" for each part of the object.
- If you press the "Drawer" button on the remote, the computer knows to slide only the fuzzy balls that make up the drawer.
- If you press the "Lamp Head" button, it rotates only those specific balls.
- Crucially, the computer figures out which balls belong to which part all by itself, without you telling it. It's like the computer watching the object move and realizing, "Ah, these 500 balls move together in a straight line, so they must be the drawer!"
The "Smart Cut" (Part-Level Decoupling):
Sometimes, the computer's guess about which balls belong to which part is a little fuzzy at the edges (like a blurry line between a drawer and the cabinet). The paper introduces a "Smart Cut" tool.
- It uses a famous AI tool called SAM (Segment Anything Model) as a super-precise pair of scissors.
- It looks at the object from different angles, finds the exact edge where the drawer stops and the cabinet begins, and "splits" the fuzzy balls right down the middle. This ensures the drawer doesn't accidentally stick to the cabinet when it moves.

Why This is a Big Deal

Smooth Motion: Because the computer understands the "Master Mold" and how to deform it, you can ask it to show the drawer halfway open, or the lamp tilted at a weird angle it has never seen before. It doesn't just guess; it smoothly morphs the model.
No Manual Labeling: You don't need to draw boxes around the parts or tell the computer how many parts there are. It figures it out by watching how things move.
Real-World Ready: The authors didn't just test this on perfect computer simulations. They built a new dataset called RS-Art (Real-to-Sim Articulatd) where they took real photos of real objects (like floppy disk drives and woven baskets) and reverse-engineered them. Their system worked great on these messy, real-world objects.

The Analogy Summary

Imagine a puppet show.

Old Methods: The puppeteer tries to move the whole puppet at once, and the limbs get tangled.
PD2GS: The puppeteer has a Master Puppet (the 3D Gaussian field). They have a Control Board (the latent codes) where they can pull individual strings. They also have a Tailor (the SAM splitting) who goes in and sews the seams perfectly so the arm doesn't get stuck to the body.

The Result

This technology allows robots and VR systems to create perfect "Digital Twins" of real-world objects. If a robot needs to open a specific drawer in a messy kitchen, PD2GS helps it understand exactly how that drawer moves, where the handle is, and how to grab it, all without needing a human to teach it the rules first.

1. Problem Statement

Articulated objects (e.g., doors, drawers, laptops) are critical for robotics, AR/VR, and digital twins. Existing self-supervised methods for modeling these objects face three primary limitations:

Representational Fragmentation: Most methods reconstruct discrete interaction states and link them via cross-state geometric consistency. This leads to "drift" and prevents smooth, continuous control of articulated configurations.
Limited Scope: Current approaches often assume simple kinematics (single-joint, two-state objects) or require manual supervision (annotated 3D models, known part counts).
Lack of Part-Level Decoupling: Dynamic rendering methods (like Dynamic NeRF or 4D-GS) capture holistic scene transformations but fail to disentangle individual part motions, resulting in geometric distortions and blurry artifacts when handling multiple moving parts.
Evaluation Gaps: Prior work relies heavily on synthetic datasets with limited intra-category diversity and lacks rigorous real-world evaluation benchmarks.

2. Methodology: PD2GS

The authors propose PD2GS, a fully self-supervised framework that learns a shared canonical 3D Gaussian field and models arbitrary interaction states as its continuous deformations. The pipeline consists of four key stages:

A. Deformable Gaussian Splatting (Latent-Conditioned Field)

Instead of learning separate 3D representations for each state, PD2GS learns a single canonical Gaussian field $\mathcal{G}$ .

Latent Conditioning: Each interaction state $k$ is associated with a latent code $\alpha_k$ .
Deformation Network: A Multi-Layer Perceptron (MLP) $f_{def}$ takes the canonical Gaussian parameters (position $\mu$ , rotation $q$ , scale $s$ ) and the latent code $\alpha_k$ to predict per-primitive offsets ( $\Delta\mu, \Delta q, \Delta s$ ).
Unified Representation: This allows the model to generate the geometry for any state $k$ by warping the canonical field, encoding geometry, appearance, and kinematics jointly without manual supervision.

B. Coarse Part-Level Segmentation

To decouple parts without prior knowledge of part counts:

Motion-Driven Separation: The system identifies dynamic Gaussians by calculating the maximum Euclidean displacement of their centers across all observed states. Static Gaussians are separated from dynamic ones.
VLM-Guided Part Counting: A Vision-Language Model (VLM) analyzes pairs of images from different states to estimate the number of moving components ( $N_{parts}$ ).
Trajectory Clustering: Dynamic Gaussians are clustered using K-means based on their motion trajectories (direction and magnitude). This groups primitives belonging to the same rigid body.

C. Refined Part-Level Segmentation (Coarse-to-Fine)

To resolve boundary ambiguities and ensure sharp interfaces:

Visibility-Aware Prompting: The system renders visibility maps for each coarse part cluster. It generates sparse positive and negative point prompts for the Segment Anything Model (SAM) based on pixel-level dominance of specific parts.
Boundary-Aware Splitting: Using SAM masks, Gaussians that straddle part boundaries are split. The split ratio is determined by the fraction of the Gaussian's projected ellipse lying inside the mask.
Recursive Refinement: The split process is applied recursively to background children until all Gaussians are fully contained within a single part mask. The parameters of new children are locally fine-tuned to restore photometric consistency.

D. Multi-Task Modeling

The resulting part-aware Gaussian field supports:

Mesh Extraction: Converting part-specific Gaussians into watertight meshes via Marching Cubes.
Joint Typing: Classifying joints as revolute (hinged) or prismatic (sliding) by analyzing the rank of residual displacement matrices between states.
Kinematic Parameter Estimation: Fitting minimal motion models (pivot points, axes, slide vectors) to the Gaussian trajectories.

3. Key Contributions

PD2GS Framework: A novel self-supervised approach that unifies articulated object modeling into a single canonical 3D Gaussian field with latent-conditioned continuous deformation, enabling smooth transitions to unseen states.
Automatic Part Decoupling: A coarse-to-fine segmentation strategy driven by motion trajectories and refined by SAM, which eliminates the need for manual part count specification or annotated 3D models.
RS-Art Dataset: A new real-to-sim evaluation dataset containing 18 real-world articulated objects (6 categories) with reverse-engineered 3D models, URDFs, and dense RGB-D captures across multiple states. This bridges the gap between synthetic benchmarks and real-world deployment.
Superior Performance: Demonstrated state-of-the-art results in geometric accuracy, kinematic parameter estimation, and consistency under continuous control on both synthetic (PartNet-Mobility) and real-world data.

4. Experimental Results

Quantitative Performance: On the expanded PartNet-Mobility dataset, PD2GS significantly outperforms baselines (PARIS, ArticulatedGS, DTArt, ArtGS) in Chamfer Distance (geometry) and joint parameter errors (Axis Angle/Position). It achieves lower errors even without knowing the number of parts in advance.
Generalization: The method successfully interpolates between unseen interaction states, generating collision-free, smooth trajectories for individual parts, whereas baselines often suffer from part blending or geometric artifacts.
Real-World Robustness: On the RS-Art dataset, PD2GS maintains structural integrity and accurate part separation despite sensor noise and challenging lighting, outperforming the only other viable baseline (ArtGS).
Ablation Studies: Removing the refinement stage leads to significant geometric degradation (bleeding parts), confirming the necessity of the SAM-guided splitting. Increasing the number of interaction states ( $K$ ) from 2 to 4–6 drastically reduces joint estimation errors by providing stronger motion constraints.

5. Significance

PD2GS represents a significant step forward in digital twin modeling and robotic manipulation. By moving from discrete state matching to continuous deformation fields, it enables:

Fine-Grained Control: Robots or VR systems can manipulate specific parts of an object independently without retraining or manual intervention.
Scalability: The framework handles complex multi-joint objects (up to 5 moving parts in experiments) without requiring prior knowledge of the object's structure.
Real-World Applicability: The release of the RS-Art dataset and the method's robustness on real data address the "sim-to-real" gap, making high-fidelity articulated object reconstruction feasible for real-world applications.

The work establishes a new paradigm for articulated object modeling where geometry, appearance, and kinematics are learned jointly in a unified, self-supervised manner, paving the way for more intelligent and interactive digital twins.

PD2^{2}2GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting

The Problem: The "Frozen Frame" Confusion

The Solution: The "Master Mold" and the "Magic Remote"

Why This is a Big Deal

The Analogy Summary

The Result

1. Problem Statement

2. Methodology: PD2GS

A. Deformable Gaussian Splatting (Latent-Conditioned Field)

B. Coarse Part-Level Segmentation

C. Refined Part-Level Segmentation (Coarse-to-Fine)

D. Multi-Task Modeling

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

Algorithmic Barriers to Detecting and Repairing Structural Overspecification in Adaptive Data-Structure Selection

Zero-Cost NDV Estimation from Columnar File Metadata

Persistence-based topological optimization: a survey

Multi-LLM Query Optimization

PD $^{2}$ GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting