Conformal Graph Prediction with Z-Gromov Wasserstein Distances

Imagine you are a detective trying to identify a mysterious molecule based on a blurry photo of its mass spectrum (a kind of chemical fingerprint). You have a massive library of possible suspects (molecules), and your AI assistant gives you its best guess.

The Problem:
Usually, the AI just points to one suspect and says, "It's definitely this one!" But what if the AI is wrong? In chemistry, guessing wrong can be expensive or dangerous. You need the AI to say, "I'm pretty sure it's one of these five molecules," so you can check them all. This is called Uncertainty Quantification.

The Challenge:
Predicting a molecule is like predicting a complex Lego structure. Unlike predicting a number (like "the temperature is 20°C"), predicting a graph (a molecule) is hard because:

No Order: You can rotate a molecule or rename its atoms, and it's still the same molecule. Standard math tools get confused by this.
Complexity: There are billions of possible structures. You can't just list them all to check which ones fit.

The Solution: A "Safe Zone" for Graphs
The authors of this paper built a new system called Conformal Graph Prediction. Think of it as a "Safety Net" for AI predictions. Here is how it works, using simple analogies:

1. The "Shape-Shifting" Ruler (Z-Gromov-Wasserstein)

To compare two molecules, you need a ruler that doesn't care if the atoms are labeled differently or if the molecule is turned sideways.

The Analogy: Imagine trying to compare two jigsaw puzzles. If you just look at the pieces in order, they might look totally different. But if you look at how the pieces connect to each other (the pattern), you can see if they are the same puzzle.
The Tech: The authors use a mathematical tool called Z-Gromov-Wasserstein (Z-GW). It's like a "smart ruler" that measures the distance between two molecules by looking at their internal structure and connections, ignoring how they are labeled. This ensures the AI compares "apples to apples," even if the apples are named differently.

2. The "Safety Net" (Conformal Prediction)

Once the AI makes a guess, how do we know how confident it is?

The Analogy: Imagine a weather forecaster. Instead of just saying "It will rain," a conformal predictor says, "Based on past data, there is a 90% chance the rain will fall within this specific area."
The Tech: The system looks at how often the AI was "wrong" (or how far off it was) on past data. It then draws a "Safety Net" around the prediction. If the true molecule is inside the net, the system is "covered." The paper proves mathematically that this net will catch the true answer 90% of the time, no matter how weird the data is.

3. The "Smart Filter" (SCQR)

Sometimes, some guesses are easy (the AI is very confident), and some are hard (the AI is confused). A standard safety net is the same size for everyone, which is wasteful.

The Analogy: Imagine a security guard at a concert.
- Standard Method: The guard checks everyone with the same intensity, creating a huge line for everyone, even the VIPs who are clearly on the list.
- SCQR (Score Conformalized Quantile Regression): This is a "Smart Guard." If you look like a VIP (easy input), the guard lets you through quickly with a small check. If you look suspicious (hard input), the guard does a thorough, wider check.
The Tech: The authors created a method called SCQR. It adjusts the size of the "Safety Net" based on how difficult the specific input is. For easy molecules, the net is tiny (just 1 or 2 candidates). For hard ones, it gets bigger. This saves time and effort without losing accuracy.

Real-World Results

The team tested this on two things:

Synthetic Images: Turning pictures of colored dots into graphs.
Real Chemistry: Identifying molecules from mass spectrometry data (a real-world problem in drug discovery).

The Outcome:

Accuracy: The system successfully caught the correct molecule 90% of the time (as promised).
Efficiency: By using the "Smart Filter" (SCQR), they reduced the number of candidates the chemists had to check. In the chemistry test, instead of checking an average of 24 molecules, they only needed to check 15, and for the easy ones, often just 1.

The Bottom Line

This paper gives scientists a way to trust their AI when predicting complex structures like molecules. It provides a guaranteed safety net that adapts to the difficulty of the problem, ensuring that when the AI says, "It's one of these," you can be confident the answer is in that list, and the list is as small as possible.

Here is a detailed technical summary of the paper "Conformal Graph Prediction with Z-Gromov Wasserstein Distances."

1. Problem Statement

The paper addresses Supervised Graph Prediction (SGP), a regression task where the output is a structured graph (e.g., molecular structures, scene graphs) rather than a scalar or vector.

The Gap: While various models exist for predicting graphs (e.g., graph neural networks, autoencoders), they lack principled uncertainty quantification. Existing methods typically output a single "best guess" graph, offering no confidence sets. This is risky in high-stakes domains like chemistry (molecule identification) where experimental validation is costly.
The Challenge: Extending Conformal Prediction (CP) to graph-valued outputs is difficult because:
1. Permutation Invariance: Graphs are defined up to node permutation. Comparing a predicted graph to a candidate graph requires a metric that is invariant to node relabeling.
2. Non-Euclidean Space: Graphs live in a combinatorial, non-Euclidean space, making standard Euclidean CP techniques (like quantile regression on residuals) inapplicable.
3. Heteroscedasticity: Standard CP uses a single global threshold, assuming uniform uncertainty across all inputs. In reality, some inputs (e.g., complex spectra) are harder to predict than others, requiring adaptive prediction sets.

2. Methodology

The authors propose a framework that combines Z-Gromov-Wasserstein (Z-GW) distances with a novel Score Conformalized Quantile Regression (SCQR) method.

A. Theoretical Foundation: Z-Gromov-Wasserstein Distance

To handle the permutation invariance of graphs, the authors define graphs as Z-networks (metric measure spaces with relational data encoded in a general metric space $Z$ ).

Metric: They utilize the Z-Gromov-Wasserstein (Z-GW) distance. This distance compares two graphs by finding an optimal coupling between their nodes that minimizes the discrepancy between their pairwise relational structures (and node/edge attributes).
Permutation Invariance: The Z-GW distance is naturally invariant to node permutations. It defines a metric on the quotient space of graphs (where isomorphic graphs are identified as the same point).
Nonconformity Score: The core of the conformal framework is the score function $s(x, y)$ , defined as the Z-GW distance between the predicted graph $\hat{y} = f_\theta(x)$ and a candidate graph $y$ :
$s(x, y) = \text{GW}_Z^p(f_\theta(x), y)$
Because the distance is permutation-invariant, the score is well-defined on the equivalence classes of graphs.

B. Conformal Graph Prediction Framework

Exchangeability: The framework assumes the data $(X_i, Y_i)$ are exchangeable.
Validity: By computing the empirical quantile of nonconformity scores on a calibration set, the method constructs a prediction set $C_\alpha(x)$ containing all candidate graphs $y$ such that $s(x, y) \leq \hat{q}_{1-\alpha}$ .
Theoretical Guarantee: The authors prove that this set provides finite-sample, distribution-free marginal coverage guarantees ( $P(Y_{n+1} \in C_\alpha(X_{n+1})) \geq 1-\alpha$ ) even in the quotiented graph space.

C. Score Conformalized Quantile Regression (SCQR)

To address the limitation of global thresholds (which lead to overly conservative sets for easy inputs and under-coverage for hard ones), the authors introduce SCQR.

Concept: Instead of a global cutoff, SCQR learns a conditional quantile function $\psi(\omega(x))$ that predicts the expected nonconformity score based on input-dependent features $\omega(x)$ (e.g., candidate set size or spectral embeddings).
Mechanism:
1. Train a quantile regression model to estimate the $(1-\alpha)$ -quantile of the scores conditioned on $\omega(x)$ .
2. Compute adaptive residuals: $E_i = s(X_i, Y_i) - \psi(\omega(X_i))$ .
3. Calibrate a global threshold on these residuals.
4. The final prediction set includes candidates where $s(x, y) \leq \psi(\omega(x)) + \hat{q}_{1-\alpha}$ .
Result: This yields locally adaptive prediction sets that are smaller for easy inputs and larger for difficult ones, while maintaining marginal coverage.

D. Practical Implementation

Since the space of all possible graphs is infinite, the method intersects the implicit conformal set with a finite candidate library $L(x)$ (e.g., a database of molecules matching a mass spectrum). The final set is $C_L(x) = \{y \in L(x) : s(x, y) \leq \text{threshold}\}$ .

3. Key Contributions

Conformal Framework for Graphs: A novel framework for graph-valued prediction using Z-GW distances, providing the first distribution-free coverage guarantees for structured graph outputs in quotient spaces.
SCQR: The introduction of Score Conformalized Quantile Regression, extending CQR to complex, non-Euclidean output spaces to achieve adaptive uncertainty quantification.
Theoretical Proofs: Rigorous proofs establishing that exchangeability is preserved under quotient maps and that the resulting conformal sets satisfy validity guarantees.
Empirical Validation: Demonstration on both synthetic (image-to-graph) and real-world (metabolite identification) tasks.

4. Experimental Results

The authors evaluated the method on two tasks:

Synthetic Coloring Task: Predicting graph structures from images.
Metabolite Identification (MassSpecGym): Predicting molecular graphs from MS/MS spectra.

Key Findings:

Coverage Validity: Both standard CP and SCQR achieved empirical coverage close to the nominal 90% level, confirming theoretical validity.
Efficiency (Set Size Reduction):
- Standard CP: Produced valid sets but often large, especially for difficult inputs.
- SCQR: Significantly improved efficiency. In the metabolite task, conditioning on spectral embeddings (DREAMS) reduced the mean conformal set size from 24 to 15 (a ~37% reduction) while maintaining coverage.
- Adaptivity: SCQR successfully reduced the "heavy tail" of set sizes, producing smaller sets for easy inputs without sacrificing coverage on hard inputs.
Distance Variants: The Fused Gromov-Wasserstein (FGW) distance (incorporating both structure and node features) consistently outperformed pure Gromov-Wasserstein (structure only) by reducing uncertainty and set sizes.

5. Significance

Trustworthy AI in Science: This work enables the deployment of graph prediction models in safety-critical fields (like drug discovery) by providing mathematically guaranteed confidence sets. Researchers can now know how many plausible molecular structures exist for a given spectrum, rather than just a single guess.
Bridging Geometry and Statistics: It successfully bridges the gap between optimal transport geometry (Z-GW) and statistical learning theory (Conformal Prediction), offering a generalizable approach for any structured output space representable as Z-networks (e.g., meshes, point clouds).
Adaptive Uncertainty: The SCQR method demonstrates that uncertainty in structured prediction is not uniform; adapting the prediction set size to input complexity is crucial for practical utility.

In summary, the paper provides a robust, theoretically grounded, and practically effective solution for quantifying uncertainty in graph prediction, moving beyond point estimates to reliable, adaptive confidence sets.