Original authors: Maciej Satkiewicz, Roberto Corizzo, Marcin Pietroń

Published 2026-05-08✓ Author reviewed ⓘ

📖 4 min read☕ Coffee break read

Original authors: Maciej Satkiewicz, Roberto Corizzo, Marcin Pietroń

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a very intelligent, complex machine (a deep neural network) that looks at an image and decides: "That is a cat!" Yet when you ask the machine, "Why did you think that?", it usually only points to a chaotic, noise-filled jumble of pixels. It is as if you asked a chef why a soup tastes good, and he simply threw a handful of random spices at you without explaining the recipe.

This work introduces a new way to ask this question, called Semantic Pullbacks (SP). Here is how it works, using simple analogies:

The Problem: The "Brittle" Map

In simple mathematical models, one can examine the "weights" (the knobs) to see what the model likes. However, in deep networks, the standard way to find the answer is the use of gradients.

The Analogy: Imagine trying to find the path uphill by looking at a map drawn by a trembling hand. The lines are jagged, noisy, and sometimes point in the wrong direction. This is what current methods do: they create "Saliency Maps" that are often just visual noise or resemble adversarial perturbations (strange patterns that make no sense to humans).

The New Idea: The "Adjoint" Pullback

The authors argue that instead of looking at the trembling gradients, we should examine the pullback.

The Analogy: Think of the neural network as a series of funhouse mirrors and sliding doors. When a signal (the "cat" decision) comes out the back, the standard method tries to trace it back by reversing every single twist and turn exactly as it happened.
The Innovation: The authors propose a different approach. They treat the network as a set of affine operators (mathematical machines that stretch and shift things). Instead of reversing the exact chaotic twists precisely, they use a "soft" backward path.
- Softening the Gating: Many layers in a network act like strict bouncers (e.g., "If the number is negative, close the door completely"). The standard method respects this strictly and cuts off any signal that is even slightly negative. The new method uses a "soft bouncer" (a soft adjoint). It says: "If the number is almost negative, let a little bit of the signal through." This restores parts of the image that the strict bouncer would have discarded, revealing a clearer picture of what the neuron is actually attending to.

The Process: "Pullback Ascent"

Once they have this "softened" backward signal, they do not simply stop there. They take a few small steps forward in the direction the signal suggests.

The Analogy: Imagine you are in a foggy forest trying to find a hidden path.
- Old Way: You take a step based on a trembling compass (gradient). You might step off a cliff.
- New Way: You use a "soft compass" (soft pullback) that accounts for the fog. Then you take a few small, cautious steps in that direction (Pullback Ascent). This helps you find the actual, coherent path (the semantic feature) rather than just stumbling around.

What They Found

The authors tested this on famous image recognition models (such as ResNet50 and PVT) using thousands of images.

Better Maps: The new maps look like real objects (cats, dogs, cars) and not like static noise. They align much better with what humans see.
More Reliable: If you slightly alter the image, the explanation remains stable. Old methods often fluctuate wildly with tiny changes.
Faster: Unlike other methods that require running the model hundreds of times to get an average (like taking 100 photos to get a single clear one), this method accomplishes it in a single pass with a few additional steps. It is computationally efficient.
No Retraining: You can apply this to any pre-trained model you already have. You do not need to rebuild the machine or teach it new things.

The Big Picture

The work claims that deep networks are better understood as input-conditioned affine operators. In German: The network does not just calculate; it dynamically changes how it processes information based on the input. By using this "pullback" method, they can trace the "preferred direction" of a neuron back to the original image without the noise and brittleness of traditional gradient methods.

In short: They replaced a trembling, noisy flashlight with a smooth, stable beam that reveals the true shape of the object the AI is looking at, without needing to rebuild the AI itself.

Technical Summary: Semantic Pullbacks (SP)

Problem Statement

Despite advances in deep learning, interpreting the internal computations of modern neural networks remains challenging. The prevailing paradigm for post-hoc explainability relies on visualizing the gradient of an output value with respect to the input value. However, in modern architectures (e.g., those with ReLU, LayerNorm, or Self-Attention), these gradients are often noisy, unstable, and fail standard validation tests. They can be brittle, adversarial, or fail to capture semantically meaningful features.

Existing attempts to mitigate this, such as smoothing (e.g., SmoothGrad) or feature accentuation, often rely on costly stochastic sampling, strong regularization, or arbitrary modifications lacking a unified theoretical justification. Furthermore, methods like B-cos networks suggest that the problem may not be the optimization itself, but rather the direction being optimized: gradients may not be the correct generalization of explanations via weight vectors for deep networks.

Methodology

The article proposes Semantic Pullbacks (SP), a framework that reinterprets deep networks as input-conditioned affine operators. Instead of considering a neuron's preference via the gradient, the authors argue for using the adjoint action of the network's effective dynamic linear operator.

Core Concept: Pullback vs. Gradient

In a linear model, the weight vector naturally reveals the preferred input direction. In deep networks, the forward path can be modeled as a dynamic affine mapping $f(x) = W(x)x$ , where $W(x)$ depends on the forward state (gating, routing, normalization).

Gradient: Differentiates through all input dependencies, including how $W(x)$ changes with $x$ . This leads to noise from gating and normalization statistics.
Pullback: Defined as the adjoint of the dynamic linear component, $\nu_u(x) = W(x)^\top u$ . It transports a vector in the output space $u$ back to the input space without differentiating through the state-dependent parameters of $W(x)$ . For linear layers, pullback and gradient coincide; for nonlinear/routing layers (ReLU, MaxPool, Attention), they diverge.

The Semantic Pullback Framework

The authors refine the standard pullback through two main mechanisms to restore coherent local structures:

Soft Adjoint (Soft Pullback - SfP):
Standard pullbacks can still be noisy because hard gating (e.g., ReLU masks) abruptly suppresses weak but semantically relevant components. The authors introduce soft adjoints, which replace hard backward gating with a softer version controlled by a temperature parameter $\tau$ .
- Mechanism: For layers like ReLU, SiLU, or MaxPool, the hard gate (e.g., $1\{z>0\}$ ) during the backward pass is replaced only by a soft function (e.g., Normal CDF $\Phi(z/\tau)$ or a temperature-scaled sigmoid).
- Goal: This approximates the expected local pullback over the data distribution and restores weak but consistent feature components without altering the forward path or requiring stochastic sampling.
Pullback Ascent (PA):
To further improve coherent structures, especially in architectures with strong intra-layer dependencies (such as Self-Attention), the method employs an iterative refinement procedure.
- Mechanism: Starting from input $x$ , the algorithm iteratively ascends along the soft pullback vector field: $x^{(t+1)} = x^{(t)} + \alpha \cdot \text{Norm}(\tilde{\nu}_u(x^{(t)}))$ .
- Goal: This generates localized, class-conditioned perturbations that accentuate the features encoded by the target neuron. It acts as a lightweight local ascent procedure requiring only a few steps ( $K \approx 5$ ) and needing no heavy frequency-domain regularization.

Semantic Pullbacks (SP) is the umbrella term for explanations generated through these layer-specific adjoint refinements. The method operates directly on standard pre-trained models (CNNs and Transformers) without architecture changes, retraining, or fine-tuning.

Main Contributions

Framework for Semantic Pullbacks: A principled method for post-hoc explanation based on softened adjoint transport. It unifies concepts from gradient smoothing, B-cos alignment, and feature accentuation under the view that neurons represent features in expectation over local data distributions.
Efficient Implementation: A layer-wise, closed-form implementation that works on standard pre-trained CNNs (ResNet, VGG) and Transformers (PVT). It requires no architecture changes or stochastic sampling, making it computationally efficient.
Pullback Ascent: A lightweight procedure for generating coherent, class-conditioned counterfactual perturbations in few steps, avoiding the noise and adversarial artifacts typical of standard gradient ascent.
Empirical Validation: Comprehensive evaluation over 1,000 ImageNet validation images using six metrics (fidelity, robustness, target specificity) on ResNet50, VGG, and PVT.

Results

The authors evaluated SP against established baselines (Gradient, SmoothGrad, Integrated Gradients, DeepLift, GuidedGrad-CAM, etc.) using the Quantus toolkit.

Fidelity: SP significantly improves Infidelity (a metric measuring how well an explanation predicts score changes upon perturbation) across all architectures. For example, Pullback Ascent achieved an Infidelity of 1.63 on PVT compared to 8.91 for standard gradients.
Stability & Target Sensitivity: SP methods show competitive or superior performance in Max Sensitivity (robustness) and Random Logit (target specificity). Unlike GuidedGrad-CAM, which produces similar maps for different classes (high Random Logit), SP generates distinct, target-specific explanations.
Perceptual Alignment: Qualitative results show that SP heatmaps and counterfactual perturbations are visually coherent and highlight semantically meaningful object regions, without the noisy, adversarial patterns often seen in gradient-based methods.
Efficiency: SP is computationally efficient. A single Soft Pullback essentially requires one backward pass. Pullback Ascent scales linearly with the small number of steps $K$ and remains significantly faster than sampling-based methods like SmoothGrad or path-integration methods like Integrated Gradients.

Significance and Claims

The article claims that adjoint transport should be treated as a "first-class primitive" alongside gradients in deep learning. The authors argue that:

Gradients are not always the correct generalization: In dynamic affine networks, the gradient contains terms from differentiation through gates and statistics that may not reflect the neuron's true "action" or preferred direction.
Neural features are locally expected: Meaningful features are often expressed as partially active, local expectations rather than fully realized pointwise directions. SP approximates this expectation through soft adjoints.
No retraining required: Unlike B-cos networks, which require model transformation and fine-tuning, SP can be applied directly to existing pre-trained networks to deliver more faithful and perceptually aligned explanations.
Unified perspective: The approach suggests a path-based view of neural computation, where softening the pullback smooths the gating component and effectively highlights the "strong paths" the network uses for decision-making.

The authors conclude that Semantic Pullbacks offer a practical, theoretically grounded mechanism for generating explanations that are faithful to the model's predictive behavior, stable, and perceptually aligned, without the computational overhead of sampling or the need for model retraining.

Pulling Back the Curtain on Deep Networks