Original authors: Nicholas S. DiBrita, Jason Han, Younghyun Cho, Hengrui Luo, Tirthak Patel

Published 2026-05-26

📖 5 min read🧠 Deep dive

Original authors: Nicholas S. DiBrita, Jason Han, Younghyun Cho, Hengrui Luo, Tirthak Patel

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a magical, super-complex black box that can look at a picture of a cat and tell you, "That's a cat!" This box is a Quantum Machine Learning (QML) model. It's incredibly powerful, but it works using the strange laws of quantum physics.

The problem? It's a black box. Even the people who built it can't easily explain why it decided it was a cat. Did it look at the ears? The whiskers? Or did it just get lucky? In the classical world, we have tools to peek inside and see which parts of the input mattered most. But in the quantum world, if you try to peek, the magic disappears (the quantum state "collapses"), and the answer changes.

This paper introduces HATTRIQ, a new tool designed to solve this mystery without breaking the magic.

The Core Problem: The "Unseeable" Box

Think of a quantum computer like a chef cooking a dish in a completely sealed, soundproof kitchen. You give them ingredients (the data), and they serve you a finished meal (the prediction).

Classical AI: You can ask the chef, "Did you use more salt or more pepper?" and they can check their recipe.
Quantum AI: The chef is working with ingredients that exist in two places at once (superposition). If you open the door to ask about the salt, the ingredients instantly turn into something else, and the recipe is ruined.

Because of this, we couldn't previously tell which "ingredient" (pixel in an image, or data point) was most important for the final decision.

The Solution: HATTRIQ (The "Magic Mirror")

The authors created HATTRIQ (Hadamard test-based input attribution score scheme for quantum models).

Instead of trying to peek inside the kitchen and ruin the dish, HATTRIQ uses a clever mirror trick (called a Hadamard test).

The Analogy: Imagine you want to know how much a specific ingredient contributed to the taste, but you can't taste the soup directly. Instead, you run a parallel, "ghost" version of the cooking process alongside the real one. By comparing how the real soup and the ghost soup interact, you can mathematically calculate exactly how much that specific ingredient mattered, without ever opening the pot.

HATTRIQ does this on the actual quantum hardware. It runs a special circuit that asks the quantum computer: "If I tweak this specific part of the input, how does the final answer change?" It does this by measuring the "probability" of a specific outcome, which reveals the importance of that input feature.

How It Works (The "Gradient" Concept)

In simple terms, HATTRIQ calculates Integrated Gradients.

Imagine you are walking from a blank white screen (no image) to a full picture of a cat.
HATTRIQ takes tiny steps along that path. At every step, it asks, "How much did this specific pixel contribute to the change?"
It adds up all those tiny contributions to give you a final score: "This pixel was very important (High Positive)," "This pixel was confusing (Negative)," or "This pixel didn't matter (Zero)."

What They Tested It On

The team tested HATTRIQ on several "black boxes" to see if it could explain their decisions:

Simple Patterns: Distinguishing between bars and stripes.
Handwritten Digits: Recognizing numbers like 0, 1, 3, 4, etc. (from MNIST and NIST datasets).
Clothing: Telling the difference between a dress and a shirt, or boots and sandals (FashionMNIST).
Quantum Physics Data: Even testing it on data that represents magnetic spins in a chain (TFIM dataset), proving it works on pure quantum data, not just pictures.

The Results: It Actually Works!

It makes sense: When HATTRIQ looked at a picture of the number "4," it highlighted the sharp angles of the 4 and ignored the background. When it looked at a "3," it highlighted the curves. It didn't just guess; it found the actual features the model was using.
It's robust: They tested it with "noisy" quantum hardware (simulating a slightly broken or imperfect machine). Even with errors, HATTRIQ still gave clear, accurate answers.
It's efficient: They showed that you can run these tests in parallel (using multiple "ghost" kitchens at once) to speed things up.

Why This Matters

Before HATTRIQ, if a quantum AI made a mistake, we had no idea why. We were flying blind.

Trust: Now, we can verify if the AI is looking at the right things (like the shape of a shoe) or the wrong things (like a random speck of dust).
Debugging: If the AI is biased or confused, HATTRIQ helps the developers see exactly where the confusion is happening so they can fix the model.

In short, HATTRIQ is the first flashlight that lets us see inside the quantum black box without turning off the lights. It translates the confusing, invisible quantum decisions into a clear map of "what mattered" for the final answer.

Technical Summary: HATTRIQ – Designing Integrated Gradients for Feature Attribution in Quantum Machine Learning

1. Problem Statement

Quantum Machine Learning (QML) algorithms show promise across hardware platforms but suffer from a lack of interpretability due to the inherent opacity of quantum state evolution and the absence of intermediate observability. While classical interpretability methods like Integrated Gradients (IG) and surrogate-based sensitivity analysis are well-established, they are not directly compatible with quantum circuits.

Measurement Collapse: Attempting to record or log hidden quantum states after each circuit layer collapses the state, destroying the computation.
Exponential Complexity: Simulating state evolution for large models requires manipulating complex amplitude vectors in exponentially large Hilbert spaces, making classical simulation resource-intensive.
Incompatibility: Traditional sensitivity methods (e.g., Sobol/Shapley scores) cannot preserve the unitarity of quantum circuits, and standard gradient rules (like parameter-shift) are difficult to apply when data is encoded via amplitude embedding, as the state preparation circuit structure often changes based on the input.

Consequently, there is a critical gap in understanding how input features influence final measurement outcomes in amplitude-encoded QML models.

2. Methodology: HATTRIQ

The authors propose HATTRIQ (Hadamard test-based input attribution score scheme for quantum models), a framework designed to compute input-attribution scores for circuit-based QML models using amplitude embedding. The methodology adapts the classical Integrated Gradients (IG) approach to the quantum setting without requiring access to internal quantum states.

Core Components

Amplitude Embedding: The framework targets the widely used encoding scheme where data features are encoded as amplitudes of the input state $|x\rangle = \sum x_i |b_i\rangle$ . This allows for encoding exponentially many features relative to the number of qubits.
Gradient Derivation (Lemma III.1): The authors derive a closed-form expression for the gradient of the model output $F(x; \theta)$ with respect to the real ( $c_k$ ) and imaginary ( $d_k$ ) components of the input amplitudes.
$\frac{\partial F}{\partial c_k} = 2 \text{Re}[\langle b_k | U^\dagger(\theta) O U(\theta) | x \rangle]$
This expression relates the gradient to the expectation value of a specific operator acting on the input state, avoiding the need for explicit state tomography.
Hadamard Test Construction: To compute the required inner products $\langle b_k | \tilde{O} | x \rangle$ $⟨ b_{k} ∣ \tilde{O} ∣ x ⟩$ directly on quantum hardware, HATTRIQ employs a Hadamard test circuit.
- The circuit uses an ancilla qubit (or register) and controlled operations.
- One branch prepares the basis state $|b_k\rangle$ , and the other prepares the input state $|x\rangle$ followed by the model unitary $U(\theta)$ and observable $O$ .
- The probability of measuring the ancilla in the $|0\rangle$ state yields the real part of the inner product, which corresponds to the gradient component.
Parallelization (Theorem IV.3): To address the linear scaling of circuit evaluations with the number of input features, the authors introduce a multi-ancilla parallelization technique. By using $m$ ancilla qubits, the framework can compute $2^m - 1$ gradient components concurrently in a single circuit execution, preserving the exponential space efficiency of amplitude embedding.

3. Key Contributions

The paper outlines the following specific contributions:

Formalism for IG in QML: A formal method to compute integrated gradients for QML models utilizing amplitude embedding, a scheme previously difficult to differentiate efficiently.
Quantum-Native Circuit Construction: A Hadamard test-based circuit design that computes exact feature gradients directly on quantum hardware without requiring knowledge of the internal quantum state.
Parallelization Strategy: A multi-ancilla technique enabling concurrent gradient computation, which scales exponentially with the number of ancilla qubits, making the approach viable for larger datasets on devices with sufficient capacity.
Empirical Validation: Evaluation of HATTRIQ on classification tasks using Bars and Stripes, MNIST, FashionMNIST, and a synthetic Transverse-Field Ising Model (TFIM) quantum dataset.
Open Source: Release of the code and datasets to facilitate reproducibility.

4. Experimental Results

The authors evaluated HATTRIQ using simulations (assuming ideal error-corrected hardware) and noisy simulations representative of early fault-tolerant quantum computing (EFTQC).

Datasets and Accuracy: The framework was tested on binary classification tasks across multiple datasets. Models achieved high accuracy (e.g., ~95-100% on Bars/Stripes and NIST subsets; ~70-99% on FashionMNIST subsets).
Attribution Quality:
- Semantic Relevance: For image datasets (NIST, MNIST, FashionMNIST), the generated attribution maps highlighted semantically meaningful regions. For example, in digit classification, attributions focused on the strokes of the digits while ignoring background noise. In FashionMNIST, the model correctly identified straps on bags and excluded sandal areas.
- Encoding Comparison: While angle and amplitude encoding achieved similar classification accuracies, HATTRIQ revealed that they produced markedly different attribution patterns, suggesting different feature utilization strategies.
- Null Model Validation: When applied to "null models" (randomly initialized parameters), HATTRIQ produced diffuse, non-concentrated attribution maps, confirming that the structured attributions in trained models are not artifacts of the method but reflect learned features.
Robustness:
- Shot Noise: Even with low measurement shot counts (10–100 shots), the attribution scores remained largely faithful to exact simulations, with deviations only appearing in weak attributions.
- Hardware Noise: Under a depolarizing noise channel ( $\gamma = 10^{-4}$ ), the attribution scores maintained their spatial distribution and relative intensity, demonstrating suitability for early fault-tolerant devices.
Quantum Data (TFIM): On the synthetic TFIM dataset, HATTRIQ correctly identified phase transitions. In the strong-field regime ( $g > 1$ ), attributions focused on $\sigma_x$ correlations, while in the weak-field regime ( $g < 1$ ), attributions shifted to $\sigma_y$ and $\sigma_z$ components, aligning with physical intuition.

5. Significance and Claims

The paper positions HATTRIQ as the first gradient-based input attribution method specifically designed for QML models that supports amplitude encoding and operates natively on quantum hardware.

Hardware Compatibility: Unlike previous interpretability methods that rely on classical surrogates or perturbation (which break unitarity or require state collapse), HATTRIQ uses a circuit-based construction that respects quantum mechanics.
Scalability: By leveraging the Hadamard test and multi-ancilla parallelization, the method addresses the exponential complexity of amplitude encoding, offering a path to scalable interpretability for larger quantum models.
Trust and Debugging: The framework enables the identification of highly influential features and allows researchers to check if model predictions align with semantically meaningful regions. It also provides a mechanism to detect bias toward background artifacts or to compare different encoding schemes that may have similar accuracy but different internal logic.

The authors conclude that HATTRIQ provides a unified, implementation-agnostic framework for input attribution with fidelity guarantees, marking a significant step toward making QML models transparent and trustworthy. Future work is noted as potentially extending the framework to parameter and layer attributions.

HattriQ: Designing Integrated Gradients for Feature Attribution in Quantum Machine Learning