Geometry Distributions

Imagine you want to teach a computer to understand the shape of a complex object, like a delicate jellyfish or a statue with thin, wispy wings.

Traditionally, computers have tried to do this in two main ways:

The Mesh Method: Like building a sculpture out of Lego bricks. It's great if the object is solid, but if you try to make a thin wire or a floating piece of paper, the "bricks" fall apart or look blocky.
The SDF Method (Signed Distance Function): Like filling a room with invisible fog. The computer calculates how far every point in the room is from the surface. If the object is a solid ball, the fog works perfectly. But if the object is a thin wire or has a hole in it, the fog gets confused and the shape breaks.

The Problem: Both methods struggle with "messy" 3D shapes—things that aren't perfectly closed, have very thin parts, or have complex holes.

The New Idea: "Geometry Distributions" (GEOMDIST)

The authors of this paper propose a completely different way to think about 3D shapes. Instead of trying to build the shape out of bricks or fill it with fog, they treat the shape as a cloud of probability.

Here is the analogy:

1. The Gaussian Cloud (The "White Noise")

Imagine you have a giant, invisible cloud of static noise (like the "snow" on an old TV screen) floating in 3D space. This is what mathematicians call a Gaussian distribution. Right now, it's just random chaos.

2. The Magic Filter (The Diffusion Model)

The authors trained a special AI "filter" (a diffusion model). Think of this filter as a magical sieve or a sculptor's hand.

The Training: They showed the AI millions of points from real 3D objects (like a lamp, a lion, or a jellyfish). The AI learned a specific set of rules: "If I see a point in the noise cloud here, I need to move it there to match the shape of the object."
The Result: The AI learned a "map" or a "trajectory." It knows exactly how to take a random speck of noise and guide it to land perfectly on the surface of the object.

3. Generating the Shape

Once the AI is trained, you don't need to store the whole object. You just need the "recipe" (the AI model).

To see the object, you start with a fresh cloud of random noise.
You run it through the AI's "filter."
Poof! The random noise transforms into millions of points that perfectly outline the object.

Why is this a big deal?

1. It handles the "Impossible" shapes
Because this method just moves points around, it doesn't care if the object is a solid ball, a hollow shell, a thin wire, or a shape with holes in it. It treats all surfaces the same way: just a collection of points. It's like drawing a picture with a pen; you can draw a solid circle or a single line, and the pen doesn't care about the difference.

2. Infinite Resolution
With Lego bricks (meshes), if you want more detail, you have to add more bricks, which takes up a lot of memory. With this new method, you can ask for 10 points or 10 million points from the same model. The model just generates as many points as you need, instantly. It's like having a recipe for a cake that can feed 2 people or 2,000 people without changing the ingredients.

3. It's reversible
The process works both ways.

Forward: Noise $\rightarrow$ Shape (Generating the object).
Backward: Shape $\rightarrow$ Noise (Compressing the object).
You can take a complex 3D model, run it through the "backward" filter, and turn it into a tiny bit of random noise data. This is huge for compression. You could send a tiny file of "noise" to a friend, and their computer could "decode" it back into the full 3D model.

Real-World Applications Mentioned in the Paper

Textured Objects: You can teach the AI to not just move the points to the right spot, but also to carry color information. So, the generated points aren't just white dots; they are colored dots that form a textured 3D model.
Animation: By adding "time" to the equation, the AI can learn how a shape moves. It can generate a jellyfish swimming by moving the points smoothly over time.
Rendering: The points generated by this method can be used for "Gaussian Splatting," a new way to create photorealistic images that look like real photos but are generated from 3D data.

Summary

Think of traditional 3D modeling as trying to build a house with specific bricks (Mesh) or filling a mold with concrete (SDF). If the house has a weird shape, the bricks don't fit, or the concrete cracks.

GEOMDIST is like having a magical wind that can blow random dust particles into the exact shape of a house, a tree, or a jellyfish, no matter how complex. It's flexible, infinitely detailed, and can be compressed into a tiny "wind recipe" that anyone can use to recreate the shape.

1. Problem Statement

Current neural representations for 3D geometry, particularly coordinate-based networks (e.g., Signed Distance Functions - SDFs, Occupancy fields, and Vector fields), face significant limitations:

Topological Constraints: They struggle with non-watertight surfaces, open boundaries, and complex genus (e.g., objects with holes or disconnected parts).
Thin Structures: They often fail to accurately model thin structures or sharp features due to smoothing effects inherent in implicit functions.
Discretization Bias: Traditional methods like meshes or point clouds rely on specific discretizations or sampling choices, which can lead to information loss or inconsistent data structures.
Texture/Motion Integration: Integrating color or dynamic motion with implicit functions is often non-trivial.

The authors propose a paradigm shift: instead of modeling geometry as a field (scalar or vector) defined over space, they model geometry as a probability distribution of surface points.

2. Methodology: GEOMDIST

The core contribution is GEOMDIST (Geometry Distributions), a representation that encodes a 3D surface $M$ as a probability distribution $\Phi_M$ . The goal is to learn a mapping such that any sample drawn from this distribution lies on the surface ( $x \in M$ ).

A. Core Concept

Distribution Modeling: The surface is treated as the target distribution. The model learns to transform samples from a standard Gaussian distribution $\mathcal{N}(0, I)$ into surface points.
Infinite Resolution: Unlike point clouds which are finite samples, GEOMDIST represents the continuous distribution of all possible surface points, allowing for sampling at arbitrary resolutions.

B. Inference Process (Forward & Inverse Sampling)

The method utilizes a diffusion model framework to define a trajectory between the Gaussian noise space and the geometry space via an Ordinary Differential Equation (ODE).

Forward Sampling ( $E$ ):
- Starts with Gaussian noise $x_T \sim \mathcal{N}(0, T \cdot I)$ .
- Solves the ODE backward in time (from $t=T$ to $t=0$ ) using a learned denoiser network $D_\theta(x, t)$ .
- The trajectory is defined by:
  $dx = \frac{x - D_\theta(x, t)}{t} dt$
- The endpoint $x_0$ lies on the target surface $M$ . This allows generating an infinite number of points uniformly distributed on the surface.
Inverse Sampling ( $D$ ):
- Starts with a known surface point $x_0 \in M$ .
- Solves the ODE forward in time (from $t=0$ to $t=T$ ) to map the point back to the Gaussian noise space.
- This enables applications like texture mapping, compression, and semantic correspondence by mapping specific surface features back to specific noise coordinates.

C. Network Architecture & Training

Architecture: A novel network architecture adapted from magnitude-preserving (MP) layers (inspired by [21]). It avoids standard coordinate-based hashing grids or simple MLPs, which the authors found insufficient for capturing fine geometric details.
Training Strategy:
- Dynamic Resampling: To simulate an "infinite" dataset of surface points, the training set is resampled (e.g., $2^{25}$ points) at the beginning of every epoch.
- Objective: The network minimizes the difference between the predicted clean point and the ground truth point given noisy input:
  $\min_\theta \mathbb{E}_{x, n, \sigma} \| D_\theta(x + \sigma n, \sigma) - x \|^2$
- Input: 3D coordinates + noise level (time step).
- Extensions: The input can be extended to 6D (3D position + 3D color) for textured meshes or 4D (3D position + time) for dynamic objects.

3. Key Contributions

Novel Representation: Introduced GEOMDIST, a distribution-based representation that makes no assumptions about surface genus, connectivity, or watertightness. It handles open surfaces, thin structures, and non-manifold geometries effectively.
Uniform Sampling: Unlike vector field-based methods that often produce clustered or non-uniform samples, GEOMDIST generates uniformly distributed points on the surface, leading to higher fidelity in mesh reconstruction.
Inverse Mapping Capability: The derivation of the backward ODE allows for a bijective (or near-bijective) mapping from surface points back to noise space. This is crucial for tasks like neural compression and semantic correspondence.
Versatility: The framework naturally supports multi-modal data, including:
- Textured Meshes: By encoding color in the distribution.
- Dynamic Objects: By adding a temporal dimension.
- Rendering: Compatible with Gaussian Splatting for photo-realistic rendering.

4. Results & Evaluation

The authors evaluated GEOMDIST qualitatively and quantitatively against SDFs, Vector Fields, and other coordinate-based networks.

Accuracy: GEOMDIST achieves lower Chamfer Distances (CD) compared to baselines. For example, on the "Archimedes" shape, GEOMDIST achieved a CD of $2.780 \times 10^{-3}$ compared to $9.803 \times 10^{-3}$ for a coarse sampling baseline.
Handling Complex Geometries:
- Successfully modeled non-watertight objects (e.g., open lamp meshes, jellyfish) where SDFs failed to reconstruct thin structures.
- Demonstrated robustness on high-genus objects.
Ablation Studies:
- Network Architecture: The proposed MP-layer architecture significantly outperformed Hashing Grids and standard MLPs.
- Sampling Steps: Increasing ODE solver steps improved accuracy, converging around 32-64 steps.
- Noise Source: Gaussian noise performed slightly better than Uniform noise as the initial distribution.
Applications Demonstrated:
- Remeshing: Generated meshes at arbitrary resolutions (1K to 2M points) using the Ball Pivoting algorithm.
- Texture: Successfully reconstructed textured surfaces.
- Gaussian Splatting: Used generated points to initialize Gaussian splatting for novel view synthesis.
- 4D Modeling: Encoded the motion of a deforming object (a lamp) in a single network.

5. Significance

This work represents a fundamental shift in how 3D geometry is represented in deep learning.

Flexibility: By decoupling geometry from specific topological constraints (like watertightness), it opens the door to modeling real-world scanned data which is often noisy and non-manifold.
Continuity: It provides a continuous, resolution-independent representation, bridging the gap between discrete point clouds and implicit fields.
Foundation for Future Research: The authors highlight that this distribution-based approach enables new geometric operators, better compression strategies, and more robust generative models for 3D content creation.

In summary, GEOMDIST leverages the power of diffusion models to treat 3D geometry as a learnable probability distribution, offering a highly accurate, flexible, and compact alternative to traditional mesh, voxel, and implicit function representations.