Geometry-Aware Probabilistic Circuits via Voronoi Tessellations

Imagine you are running a large, highly organized library (this is a "Probabilistic Circuit"). Your job is to answer questions about books, like "What is the chance a reader will pick a mystery novel?" or "If someone likes sci-fi, what else might they like?"

To do this efficiently, your library is built with a strict set of rules:

The Librarians (Sum Nodes): These are the decision-makers. They decide which section of the library to send a reader to.
The Rules: In traditional libraries, the Librarians use a fixed map. No matter who walks in, they always send "people who like red shirts" to the History section and "people who like blue shirts" to the Sci-Fi section. This map is simple and fast to use (tractable), but it's rigid. It can't adapt if a person wearing a red shirt actually loves Sci-Fi.

The Problem: The "Rigid Map"

The authors of this paper say: "Our library is too rigid. Real life is messy. Sometimes a red-shirted person loves Sci-Fi, and sometimes a blue-shirted person loves History. We need the Librarians to look at the shape of the person and the location they are standing in to decide where to send them."

They want to introduce Voronoi Tessellations.

The Analogy: Imagine the library floor is covered in sticky notes, each with a "Centroid" (a favorite spot) written on it. When a reader walks in, they are automatically sent to the section closest to their favorite spot.
The Result: This creates a dynamic, geometric map. If you are standing near the "Sci-Fi" centroid, you go there, even if you're wearing red. This captures the local geometry of the data much better.

The Conflict: "The Math Breaks"

Here is the catch.

The Old Way: The Librarians used a simple, straight-line map. You could calculate the answer instantly because the math was easy (like adding up numbers in a straight line).
The New Way: The Voronoi map creates slanted, jagged, irregular shapes (polygons) on the floor.
The Disaster: When you try to do the math with these slanted shapes, the calculation becomes impossible to solve quickly. It's like trying to count the exact number of grains of sand in a pile of sand that keeps changing shape. In computer science terms, the "tractability" (the ability to calculate answers quickly) is broken.

The Solution: Two New Strategies

The authors didn't give up. They came up with two clever ways to fix this:

Strategy 1: The "Safe Guess" (Certified Approximate Inference)

Since we can't calculate the exact answer with the slanted shapes, let's calculate a safe range.

The Analogy: Imagine you need to know how much water is in a weirdly shaped bucket. You can't measure the bucket directly. Instead, you put a smaller box inside the bucket (you know for sure the water is at least this much) and a larger box around the bucket (you know for sure the water is no more than this much).
The Magic: The authors developed a way to shrink that gap between the small box and the large box. They can give you an answer like: "The probability is definitely between 40% and 42%."
Why it's cool: Even though it's an estimate, they can prove the real answer is inside that range. It's a "guaranteed" guess.

Strategy 2: The "Lego Block" (Hierarchical Factorized Voronoi)

This strategy changes the rules of the game so the math works again.

The Analogy: Instead of trying to build one giant, complex, slanted shape, we build the map using Lego blocks that align perfectly with the library's existing shelves.
The Trick: We force the "slanted" Voronoi shapes to be made of simple, straight-edged pieces that match the library's structure.
The Result: We get the benefits of the geometric map (it adapts to the reader's location), but because we built it with "Lego blocks," the math stays simple and fast. We get the exact answer without breaking the speed.

The Learning Process: "Soft" to "Hard"

There was one last problem: Computers learn by making small adjustments (gradients). But a Voronoi map is "hard"—you are either in one zone or another. You can't slide smoothly from one to the other.

The Fix: The authors introduced a "Temperature" knob.
- High Temperature (Soft): At the start of training, the map is fuzzy. The boundaries are blurry, like a foggy glass. The computer can easily slide the "Centroids" around to find the best spots.
- Low Temperature (Hard): As training finishes, they turn the temperature down. The fog clears, the boundaries become sharp and crisp, and the map becomes the rigid, geometric Voronoi shape.
The Outcome: The computer learns the best layout while the map is fuzzy, then snaps it into a sharp, perfect shape for the final result.

Summary

The paper teaches us how to build smarter, more flexible AI models that understand the shape of data, not just simple averages.

Old way: Rigid, fast, but dumb.
New way (Voronoi): Flexible and smart, but mathematically broken.
The Fix:
- Either give a guaranteed safe range (Strategy 1).
- Or build the map using aligned Lego blocks to keep it fast and exact (Strategy 2).

This allows AI to understand complex, real-world patterns (like the swirl of a galaxy or the knot of a rope) while still being able to answer questions instantly and reliably.

1. Problem Statement

Probabilistic Circuits (PCs) are a class of generative models that enable exact, tractable inference (e.g., computing likelihoods, marginals, and conditionals in linear time) by enforcing structural constraints like decomposability and smoothness. However, standard PCs suffer from a critical limitation: the mixture weights at sum nodes are data-independent (global constants). This prevents the model from adapting its routing decisions to the local geometric structure of the data manifold.

Real-world distributions often exhibit locality and piecewise behavior, where different regions of the input space follow distinct statistical patterns. While Mixture-of-Experts (MoE) models address this by using input-dependent gating, naively introducing such geometry-aware routing (e.g., via Voronoi tessellations) into PCs breaks the structural properties required for tractable inference. Specifically, Voronoi cells are defined by oblique half-space intersections that couple multiple input dimensions, making the integration of factorized distributions over these regions #P-hard.

Core Question: Can we introduce geometry-aware, input-dependent routing into PCs to capture local data structure while maintaining tractable inference?

2. Methodology

The authors propose replacing constant sum-node weights with Voronoi-gated sum nodes, where inputs are routed to expert subcircuits based on proximity to learned centroids. To address the resulting intractability, they develop two complementary strategies:

A. Theoretical Formalization of Incompatibility

The paper first proves that a single Voronoi-gated sum node with fully factorized experts renders inference intractable. Because Voronoi boundaries are oblique (not axis-aligned), the integration domain cannot be decomposed into a product of one-dimensional integrals, violating the decomposability property essential for efficient PC inference.

B. Solution 1: Certified Approximate Inference (VT-PCs)

For general Voronoi tessellations where exact inference is impossible, the authors propose a framework to compute provable lower and upper bounds on partition functions and marginals.

Box Approximation: They replace intractable polyhedral Voronoi cells with tractable, axis-aligned bounding boxes (inner and outer boxes).
Bound Propagation: These local bounds are propagated through the circuit using standard sum-product rules.
Anytime Refinement: An adaptive algorithm recursively bisects "boundary boxes" (those intersecting Voronoi cell edges) to tighten the bounds. The gap between upper and lower bounds converges to zero as the partition becomes finer, providing a certificate of reliability.

C. Solution 2: Hierarchical Factorized Voronoi (HFV-PCs)

To recover exact tractable inference, the authors introduce a structural constraint called Geometric Alignment.

Factorization: The Voronoi tessellation is constrained to align with the circuit's variable decomposition (vtree). Instead of a single high-dimensional Voronoi cell, the gating mechanism is a Cartesian product of lower-dimensional Voronoi cells, one for each disjoint variable block in the circuit.
Result: This ensures that the gating regions factorize exactly like the expert distributions, restoring the ability to apply Fubini's theorem and perform exact recursive integration.
Complexity: Exact inference is maintained with a time complexity of $O(|C|K^m)$ , where $|C|$ is the circuit size, $K$ is the number of centroids per factor, and $m$ is the factorization degree.

D. Learning via Soft Gating

Since hard Voronoi assignments are non-differentiable, the authors introduce a soft gating mechanism for training:

Soft Voronoi Gate: Uses a temperature-scaled softmax over squared Euclidean distances to centroids: $w_k(u) \propto \exp(-\alpha \|u - c_k\|^2)$ .
Annealing: During training, the inverse temperature $\alpha$ is gradually increased (annealed). This allows for gradient-based optimization of centroids and circuit parameters initially, then converges to hard assignments at test time.
Convergence: The authors prove that as $\alpha \to \infty$ , the soft gate converges exponentially fast to the hard Voronoi gate, ensuring the final model retains exact inference guarantees (for HFV) or certified bounds (for VT).

3. Key Contributions

First Geometric PC Framework: Introduces the first principled approach to training PCs with Voronoi tessellations, enabling input-dependent routing based on data geometry.
Formal Incompatibility Proof: Rigorously demonstrates why general Voronoi gating breaks tractability in decomposable PCs.
Dual Solution Strategy:
- A certified approximate inference framework that provides guaranteed bounds for general geometry-aware PCs.
- A structural condition (HFV) that recovers exact tractable inference by aligning geometric partitions with circuit factorization.
Differentiable Learning: Proposes a soft-to-hard relaxation with theoretical convergence guarantees, enabling end-to-end training of centroids and circuit parameters.
Empirical Validation: Demonstrates effectiveness on synthetic 2D and 3D datasets with complex geometric structures (e.g., spirals, knots, pinwheels).

4. Results

The authors evaluated their approach on eight synthetic density estimation tasks (four 2D, four 3D) featuring complex manifolds and disconnected supports.

Performance: VT-PCs (using certified approximate inference) consistently outperformed standard baselines (EinsumNet, HCLT) in terms of test log-likelihood. The certified lower bounds often exceeded the exact likelihoods of the baselines, indicating that the geometric expressivity captured structure missed by global mixture weights.
HFV-PCs: Achieved performance comparable to baselines. While slightly less expressive than unconstrained VT due to alignment constraints, they provided the advantage of exact tractability and explicit geometric interpretability.
Visualization: Learned Voronoi cells in VT models successfully adapted to the local "arms" of distributions (e.g., Pinwheel), assigning specific experts to specific regions, whereas standard PCs used global weights.
Learning Dynamics: The soft-gating annealing schedule resulted in stable training curves, with the certified lower bounds tightening as the model learned the data support.

5. Significance

This work bridges the gap between the expressivity of deep generative models (which capture local geometry) and the reliability of probabilistic circuits (which offer exact inference).

Interpretability: By using Voronoi cells, the model provides explicit regions of responsibility, making it easier to understand which part of the data space a specific expert handles.
Reliability: Unlike neural MoE models that offer no guarantees on inference, this approach provides either exact results (HFV) or mathematically certified bounds (VT).
Future Applications: The framework is particularly promising for tasks requiring continual learning (adapting to new regions of space), anomaly detection (identifying regions with low probability mass), and causal reasoning, where understanding local data structure is crucial.

In summary, the paper establishes that while arbitrary geometric routing breaks tractability, it can be successfully integrated into PCs either through structural alignment (HFV) for exact inference or certified approximation (VT) for guaranteed bounds, significantly enhancing the modeling capability of probabilistic circuits.