Counterfactual Explanations on Robust Perceptual Geodesics

Imagine you have a very smart, but slightly stubborn, AI that looks at pictures and guesses what they are. Let's say you show it a picture of a cat, and it confidently says, "That's a cat!"

Now, you want to ask the AI a "What if?" question: "What if I changed just a tiny bit of this picture so you would think it's a dog?"

This is called a Counterfactual Explanation. It's like asking the AI to show you the shortest, most logical path to change its mind.

The Problem: The "Magic Carpet" vs. The "Swamp"

The paper argues that most current methods for finding this path are broken. They treat the world of images like a flat, empty room (a "flat Euclidean space").

The Old Way: Imagine you are trying to walk from the "Cat" corner of a room to the "Dog" corner. The old methods just tell you to walk in a straight line. But in the world of AI images, a straight line often takes you off the floor and into the ceiling, or through a swamp of nonsense.
- The Result: The AI might change the picture into a dog, but the dog looks like a melted blob, has three eyes, or is floating in mid-air. These are called Adversarial Examples. They trick the AI, but they don't make sense to a human. They are "off-manifold"—meaning they don't belong in the real world of valid pictures.
The Trap: Even if the AI stays on the "floor" (the valid world of pictures), it might take a path that looks like a dog but is actually a "fake dog" that only the AI recognizes. It's like a perfect disguise that looks real to a machine but obvious to a human.

The Solution: PCG (Perceptual Counterfactual Geodesics)

The authors introduce a new method called PCG. Think of PCG as a GPS for the "Real World" of images.

Here is how it works, using a simple analogy:

1. The Terrain Map (The Manifold)

Imagine the world of all possible pictures of cats and dogs isn't a flat room, but a curved, hilly landscape.

The "Cat" area is a lush green valley.
The "Dog" area is a sunny beach.
The "nonsense" areas (three-eyed blobs) are deep swamps or high cliffs.

Old methods try to walk in a straight line through the air or the swamp. PCG knows you must stay on the ground (the "manifold").

2. The Compass (Robust Perception)

How does PCG know which way is "real" and which way is "fake"?

Old Compass: Uses a standard map that says "distance is just how many pixels differ." This is like measuring distance by how much the paint color changes. It's easily fooled by tiny, invisible scratches that look like nothing to us but confuse the AI.
PCG's Compass: Uses a Super-Compass trained on "Robust" models. These are AIs that have been trained to ignore tiny scratches and focus on what humans actually see (ears, fur texture, snout shape).
- This compass tells PCG: "Don't go that way; that path leads to a fake dog that looks weird to humans. Go this way, where the changes feel natural."

3. The Journey (The Geodesic)

In math, the shortest path between two points on a curved surface is called a Geodesic.

PCG doesn't just jump from Cat to Dog. It traces a smooth, winding path along the hills of the landscape.
It ensures that every single step along the way looks like a real animal. It slowly morphs the cat's ears, then its fur, then its snout, keeping the "vibe" of a living creature intact the whole time.

The Two-Step Dance

The paper describes a clever two-step process to find this perfect path:

Phase 1: The Blueprint. PCG first draws a smooth, curved line from the Cat to a random Dog. It ignores the specific "Dog" label for a moment and just focuses on making sure the path is smooth and stays on the "real world" ground.
Phase 2: The Refinement. Now, it pulls the end of the line closer to the original Cat picture, but only if the path stays smooth and the AI still thinks it's a Dog. It's like pulling a rubber band tight; it finds the shortest, most natural distance without snapping the band (breaking the realism).

Why This Matters

If you use the old methods, you might get an explanation that looks like this:

"To turn this cat into a dog, you need to add 500 tiny red pixels to its nose."
(This is technically correct for the AI, but useless and scary for a human.)

With PCG, you get an explanation that looks like this:

"To turn this cat into a dog, gently round the ears, lengthen the snout, and change the fur texture."
(This is a change a human can understand and trust.)

Summary

The paper is essentially saying: "Stop walking in straight lines through the void. Use a map that respects the shape of reality and a compass that understands human perception."

By doing this, PCG generates explanations that are not just mathematically correct, but semantically meaningful—they tell a story that makes sense to us, not just to the machine.

1. Problem Statement

The paper addresses a fundamental flaw in Latent-Space Counterfactual Explanation (CE) methods. CEs aim to find the minimal semantic perturbation to an input $x$ that changes a model's prediction to a target class $y'$ . While existing methods (e.g., REVISE, VSGD, RSGD) optimize this in the latent space of generative models (like GANs), they suffer from three critical failure modes in high-dimensional vision domains:

Off-Manifold Traversal: Optimization often ignores the intrinsic curvature of the data manifold, leading to generated images that are semantically implausible or "off-manifold" (adversarial examples that do not look like real data).
Local Gradient Optimization: Methods relying on local gradient updates (like SGD) lack global structural guidance, causing them to converge to "on-manifold" adversarial regions where the image looks real but the perturbation is brittle and semantically meaningless.
Metric Misalignment: Existing distance metrics (e.g., pixel-wise $\ell_2$ or features from standard, non-robust classifiers) are misaligned with human perception. They fail to distinguish between meaningful semantic changes and adversarial noise, often encouraging the latter.

The core challenge is that standard metrics cannot differentiate between a valid counterfactual (a meaningful semantic shift) and an adversarial example (a non-semantic perturbation), even when both lie on the data manifold.

2. Methodology: Perceptual Counterfactual Geodesics (PCG)

The authors propose PCG, a method that frames counterfactual generation as a global curvature-aware optimization problem on a Riemannian manifold. The key innovation is inducing a robust perceptual metric on the latent space of a generative model.

A. Robust Perceptual Ambient Metric

Instead of using pixel-space $\ell_2$ or features from standard classifiers, PCG constructs a composite ambient metric $G_R(x)$ in the image space using adversarially trained robust vision models.

Source: Intermediate layers of a robust ResNet-50 (or similar robust backbone).
Construction: The metric is a weighted sum of pullbacks of the Euclidean metric from robust feature spaces:
$G_R(x) = \sum_{k=1}^{K} w_k J_{h_k}(x)^\top J_{h_k}(x)$
where $J_{h_k}$ is the Jacobian of the $k$ -th layer activation $h_k$ .
Rationale: Robust models learn representations where gradients align with human perception and are resistant to adversarial spikes. Using these features ensures the metric penalizes brittle directions and favors smooth, semantically valid variations.

B. Latent Space Geometry

The generator $g: \mathcal{Z} \to \mathcal{X}$ induces a latent-space metric $G_Z(z)$ via the pullback of the robust ambient metric:
$G_Z(z) = J_g(z)^\top G_R(g(z)) J_g(z)$
This geometry ensures that distances in the latent space reflect perceptual similarity in the robust feature space, not just pixel similarity.

C. Two-Stage Optimization

PCG seeks a smooth latent trajectory $\gamma(t)$ (a geodesic) connecting the input to a target class. The optimization is discretized into $T+1$ points and proceeds in two phases:

Phase 1: Robust Geodesic Construction
- Goal: Find the shortest path (geodesic) between the input latent code $z_0$ and a target-class latent code $z_T$ under the robust metric.
- Objective: Minimize the Robust Perceptual Energy:
  $E_{robust}(z) = \frac{1}{2} \sum_{i=0}^{T-1} \sum_{k=1}^{K} w_k \frac{1}{\delta t} \| h_k(g(z_{i+1})) - h_k(g(z_i)) \|_2^2$
- Result: A smooth, on-manifold path that traverses robust semantic regions, avoiding adversarial shortcuts.
Phase 2: Endpoint Refinement
- Goal: Adjust the path to ensure the endpoint $z_T$ correctly classifies as the target class $y'$ while minimizing the distance to the original input.
- Objective: Minimize a combined loss:
  $L(z) = E_{robust}(z) + \lambda \cdot \ell(f(g(z_T)), y')$
- Mechanism: Uses a dynamic $\lambda$ schedule (starting small, increasing over time) and a re-anchoring strategy. Periodically, the endpoint is reset to the point on the current path closest to the input that still satisfies the target classification. This prevents the path from drifting too far or collapsing back to the input.

3. Key Contributions

Perceptual Counterfactual Geodesics (PCG): A novel framework that generates counterfactuals by tracing geodesics under a Riemannian metric induced by robust vision features.
Semantic Divide Resolution: The paper demonstrates that a robust Riemannian metric allows optimization to cross the "semantic divide," producing valid CEs rather than on-manifold adversarial examples (AEs).
Perceptual Geodesic Interpolation: The method enables smooth, semantically coherent interpolations between classes, preserving class structure and avoiding the "blending" artifacts common in linear latent interpolation.
Robust Evaluation Metrics: The authors introduce and utilize robust distance metrics (e.g., $L_R$ , R-FID, R-LPIPS) to expose failure modes that standard metrics (like pixel $\ell_2$ ) hide.

4. Experimental Results

The method was evaluated on three high-resolution datasets: AFHQ (animals), FFHQ (faces), and PlantVillage (leaves), using StyleGAN2/3 generators.

Qualitative Performance:
- Baselines (REVISE, VSGD, RSGD): Produced off-manifold artifacts (distorted textures, class ambiguity) or on-manifold AEs (semantically brittle changes).
- PCG: Generated minimal, semantically faithful changes. Transitions preserved identity (in faces) and species structure (in animals) while smoothly changing the target attribute.
Quantitative Performance:
- Distance Metrics: PCG achieved the lowest distances across all metrics, particularly under $L_R$ (robust pullback distance) and $L_F$ (standard feature pullback), indicating superior perceptual proximity.
- Realism (FID/R-FID): PCG achieved the lowest R-FID (9.1 vs. 28.3 for RSGD-C), proving its outputs are closer to the true target distribution in robust feature space.
- Manifold Alignment: PCG showed the highest Manifold Alignment Score (MAS) and Semantic Margin (SM), confirming that its trajectories stay within the valid data manifold and move into regions genuinely populated by the target class.
- Smoothness: Path-based metrics ( $\Delta$ -LPIPS) showed PCG geodesics change content much more gradually than linear or spherical interpolations.

5. Significance and Impact

Theoretical Advancement: The paper challenges the notion that distance metrics cannot distinguish CEs from AEs. It proves that if the latent space is endowed with a semantically robust geometry, optimization naturally avoids adversarial traps.
Practical Utility: PCG provides a reliable tool for generating explanations for deep learning models in safety-critical domains (e.g., medical imaging, autonomous driving) where "off-manifold" or "brittle" explanations are unacceptable.
Evaluation Standard: It highlights the insufficiency of standard metrics (like pixel $\ell_2$ or standard FID) for evaluating counterfactuals, advocating for robust, geometry-aware metrics as the new standard for assessing explanation quality.
Generalizability: While focused on StyleGANs, the framework of inducing robust metrics via pullback is applicable to other generative families, offering a path forward for robust interpretability in high-dimensional vision.