Axiomatic On-Manifold Shapley via Optimal Generative Flows

The Big Problem: "The Hallucinating Detective"

Imagine you have a super-smart AI detective that can tell if a photo is of a Cat or a Dog. You show it a picture of a fluffy Golden Retriever, and it says, "That's a dog!"

Now, you want to know why. You ask the detective, "Which part of the image made you decide it's a dog? Was it the ears? The nose? The fur?"

This is what Explainable AI (XAI) tries to do. It assigns a "score" to every pixel to show how much it contributed to the decision.

The Problem:
Most current methods try to figure this out by asking, "What if we removed this part?"

Old Method: They take the dog picture and replace the dog's nose with a black square (a "baseline").
The Flaw: A black square doesn't exist in the real world. It's an "off-manifold" artifact. The AI gets confused because it's never seen a dog with a black square nose. It might start hallucinating, saying, "Oh, that black square looks like a hole in space, so the ears must be the most important thing!"

The explanation becomes unstable and misleading because the AI is being tested on things that don't make sense in its "universe" of training data.

The Solution: "The Perfect Road Trip"

This paper proposes a new way to explain the AI's decision. Instead of jumping to a fake black square, they imagine a smooth road trip from a "blank slate" to the actual dog photo.

1. The Concept: The "Coalition Formation"

Think of the AI's decision as a team building a house.

Old Way (Shapley Values): You try to build the house by randomly adding bricks one by one. Sometimes you add a roof before a wall. It's chaotic, and the order matters too much.
New Way (This Paper): You build the house in a perfectly logical, smooth order. You start with a foundation (a blank image) and slowly, smoothly, morph it into the final house (the dog photo).

2. The Innovation: "Optimal Generative Flows"

The authors realized that the "road" we take from the blank slate to the dog photo matters.

The Bad Road: If you take a winding, bumpy, zig-zag path through "imaginary land" (where pixels are random noise), the AI gets dizzy and gives a bad explanation.
The Good Road (The Paper's Idea): They use math (called Optimal Transport) to find the straightest, smoothest, most energy-efficient road that stays strictly on the "Highway of Real Data."

The Analogy:
Imagine you are a bird flying from a nest (the blank image) to a tree (the dog photo).

Old Methods: The bird flies in a chaotic spiral, sometimes going through a wall or a cloud that doesn't exist.
This Paper: The bird finds the perfect glide path. It stays in the sky (the "manifold" of real data) the whole time. It never touches the ground or flies through a wall. It takes the path of least resistance.

3. Why "Least Resistance" Matters

The authors proved a cool mathematical theorem: If you take the path that uses the least amount of "energy" (kinetic energy) to get from the blank image to the real image, the explanation you get is the only correct one.

It's like saying: "If you want to know how much effort it took to walk from your house to the store, you shouldn't walk in circles or jump over fences. You should walk the most direct, natural path. Only then is the measurement fair."

What Did They Actually Do?

They built a "Time Machine" for images: They trained a model that knows exactly how to morph a random noise image into a real dog image without ever creating a "fake" or "impossible" intermediate image.
They measured the journey: As the image morphs from noise to dog, they tracked how the AI's confidence changed at every tiny step.
They summed it up: By adding up all those tiny changes along this perfect path, they got a score for every pixel.

The Results: "Crystal Clear Vision"

When they tested this on images (like birds, cars, and faces):

Old Methods (like Integrated Gradients): Produced "ghosting" effects. The explanation looked like a blurry mess with noise everywhere, because the AI was confused by the weird paths it was forced to take.
This New Method: Produced sharp, clean maps. If the AI decided it was a dog, the explanation highlighted the ears, nose, and tail perfectly. It didn't highlight random noise.

The Takeaway

This paper is like upgrading from a crumpled paper map to a GPS with real-time traffic.

Old XAI: "Here is a guess of what the AI saw, but it might be wrong because we asked it about things that don't exist."
New XAI: "We asked the AI to explain itself while walking the most natural, logical path through reality. The result is a trustworthy, stable, and mathematically proven explanation."

In short: To understand a black box, don't poke it with a stick (random baselines). Gently guide it along the smoothest, most natural path to the answer, and it will tell you the truth.

1. Problem Statement

The paper addresses critical limitations in post-hoc Explainable AI (XAI), specifically regarding Shapley-value-based feature attribution. While Shapley values are theoretically grounded in cooperative game theory, their practical application faces two major hurdles:

Baseline Sensitivity & Off-Manifold Artifacts: Traditional methods (e.g., Integrated Gradients, KernelSHAP) require a "baseline" (a reference input representing feature absence). Heuristic baselines (e.g., black images, mean values) often lie off the data manifold (the low-dimensional structure where real data resides). Models often react unpredictably to these artificial inputs, generating "shattered gradients" and misleading attributions.
Combinatorial Complexity & Path Ambiguity: Calculating exact Shapley values is intractable for high-dimensional data. Path-integral methods (like Integrated Gradients) approximate this by integrating gradients along a path from a baseline to the input. However, the choice of path is arbitrary. Existing "generative" paths are often chosen heuristically, lacking a theoretical guarantee of being "canonical" or geometrically optimal.

2. Methodology

The authors propose a framework that replaces heuristic baseline selection with a variational problem rooted in Optimal Transport (OT) theory.

Core Concept: On-Manifold Aumann-Shapley

Instead of fixing a baseline and drawing a line, the method defines attribution as a gradient line integral along a smooth path $\gamma$ on the data manifold:
$\Psi_i(f, x) = \int_0^1 \frac{\partial f}{\partial x_i}(\gamma(t)) \dot{\gamma}_i(t) \, dt$
This formulation satisfies the Aumann-Shapley axioms (Efficiency, Symmetry, Dummy, Additivity) and is invariant to path reparameterization.

The Canonical Path: Optimal Generative Flows

To resolve the ambiguity of which path to choose, the authors formulate path selection as minimizing kinetic energy.

Optimal Transport Formulation: They seek a flow that transports a simple reference distribution $p_0$ (e.g., Gaussian) to the data distribution $p_1$ while minimizing the Wasserstein-2 ( $W_2$ ) distance.
Benamou-Brenier Dynamic: The optimal path corresponds to the geodesic in the space of probability measures that minimizes the kinetic action functional:
$\mathcal{A}(\rho, v) = \int_0^1 \int_{\mathbb{R}^d} \|v_t(x)\|^2 \rho_t(x) \, dx \, dt$
Implementation via Rectified Flows (RF): Since computing exact $W_2$ geodesics is difficult, the authors use Rectified Flows (a type of flow matching model). They employ a Reflow procedure (iterative training) to straighten the trajectories, ensuring the generated paths approximate the optimal $W_2$ geodesics and remain strictly on the data manifold.

Theoretical Guarantees

Uniqueness Theorem: The paper proves that for a fixed path, the gradient line integral is the unique functional satisfying the Shapley axioms and reparameterization invariance.
Canonicality: Among all possible flows, the one minimizing kinetic action (the OT geodesic) yields the unique canonical attribution.
Stability: The authors derive stability bounds showing that the attribution error scales linearly with the error in the learned flow approximation ( $\|v - \hat{v}\|$ ).

3. Key Contributions

Axiomatic Framework: Established a formal theory for On-Manifold Shapley attributions, proving that gradient line integrals over optimal transport paths are the unique solution satisfying geometric and game-theoretic axioms.
Variational Baseline Selection: Reframed the baseline selection problem from a heuristic choice to a variational optimization problem (minimizing kinetic energy), eliminating arbitrary path choices.
Recovery of Classical Theory: Demonstrated that for additive models, this geometric approach exactly recovers classical discrete Shapley values, ensuring consistency with established theory.
Stability Bounds: Provided rigorous mathematical proofs that the attribution is stable against errors in the generative flow approximation.
Novel Evaluation Metrics: Introduced metrics to evaluate the geometric validity of explanations, including Flow Consistency Error (FCE) (measuring deviation from the true velocity field) and Structure-Aware Total Variation (SATV) (measuring semantic alignment vs. noise).

4. Experimental Results

The method was evaluated on CUB-200 (fine-grained birds), CIFAR-10, and CelebA-HQ (high-resolution faces), comparing against Integrated Gradients (IG), DDIM, GradientSHAP, and SmoothGrad.

Geometric Validity:
- Flow Consistency Error (FCE): The proposed "Geodesic Flow" reduced FCE by 5 orders of magnitude compared to diffusion baselines (DDIM) on CelebA-HQ, proving strict adherence to the data manifold.
- Straightness: The paths were nearly Euclidean straight lines (GPS $\approx$ 1.0) while remaining on-manifold, unlike IG which is straight but off-manifold.
Stability & Consistency:
- Using Reflowed Shapley (2-RF) significantly improved Rank Correlation (from 0.66 to 0.88) and reduced pixel variance compared to one-step baselines, confirming that geometric optimality leads to explanation stability.
- Empirical verification showed a strong linear correlation between flow approximation error and attribution error, validating the theoretical stability bounds.
Semantic Alignment:
- SATV Scores: The method achieved the lowest SATV scores, indicating it filters out high-frequency noise ("shattered gradients") while preserving sharp semantic edges (e.g., facial features, object boundaries).
- Visual Quality: Qualitative results showed coherent object masks without the "ghosting" artifacts common in IG or the over-smoothing of DDIM.

5. Significance

This work bridges the gap between Optimal Transport theory and Explainable AI.

Theoretical Rigor: It moves XAI from heuristic path choices to a principled, axiomatic foundation where the "best" explanation is mathematically defined as the one following the most efficient transport path.
Practical Impact: By ensuring explanations stay on the data manifold, the method prevents models from reacting to nonsensical inputs, leading to more trustworthy and reliable explanations. This is particularly crucial for high-stakes domains (e.g., medical imaging) where off-manifold artifacts can lead to "explanation hallucinations."
Scalability: The use of Rectified Flows allows the method to scale to high-dimensional data (256x256 images) while maintaining computational efficiency compared to exact Shapley calculations.

In summary, the paper proposes Geodesic Flow Shapley, a method that generates feature attributions by integrating gradients along the most geometrically efficient path on the data manifold, offering superior stability, semantic alignment, and theoretical guarantees over existing state-of-the-art methods.