Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models

This paper introduces the Context-Aware Layer-wise Integrated Gradients (CA-LIG) framework, a unified hierarchical attribution method that enhances the interpretability of Transformer models by computing and fusing layer-wise Integrated Gradients with class-specific attention gradients to capture context-sensitive, evolving relevance across layers, thereby providing more faithful and semantically coherent explanations than existing methods.

Melkamu Abay Mersha, Jugal Kalita

Published 2026-02-19
📖 5 min read🧠 Deep dive

Imagine you have a super-smart robot that can read a book, watch a movie, or look at a photo and tell you exactly what it thinks about it. Maybe it says, "This movie is terrible!" or "This picture is a cat!"

The problem is, this robot is a black box. It gives you the answer, but it doesn't tell you why. It's like a chef who hands you a delicious soup but refuses to tell you what ingredients went into it or how they cooked it. You just have to trust them.

In the world of Artificial Intelligence (AI), these robots are called Transformers. They are incredibly powerful, but because they have so many layers of "thinking" (like a multi-story building), it's very hard to figure out which specific words or pixels made them make their decision.

This paper introduces a new tool called CA-LIG (Context-Aware Layer-wise Integrated Gradients). Think of it as a super-powered detective that doesn't just look at the final answer; it investigates the entire building, floor by floor, to see exactly how the robot reached its conclusion.

Here is how CA-LIG works, broken down into simple concepts:

1. The Problem with Old Detectors

Previous methods of explaining AI were like looking at a building through a telescope from the street.

  • The "Final Layer" Bias: Most old tools only looked at the very top floor (the final answer). They missed all the interesting work happening on the lower floors.
  • The "Attention" Trap: Some tools just looked at where the robot was "looking" (attention). But sometimes, the robot looks at a word just to keep its balance, not because it's important. It's like a person staring at a clock while thinking about a birthday party; the clock isn't the reason for the party.
  • Missing the Context: Old tools often treated words as isolated islands. They didn't understand that the word "not" changes the meaning of the word "good" completely.

2. The CA-LIG Solution: A Layer-by-Layer Tour

CA-LIG is different because it acts like a guided tour guide who walks you through every single floor of the robot's brain.

Step 1: The "What If" Game (Integrated Gradients)

Imagine you have a sentence: "The movie was amazing."
CA-LIG plays a game of "What if?" It slowly turns the sentence into gibberish (like "The movie was... uh...") and watches how the robot's confidence changes at every single step.

  • If the robot's confidence drops when it removes "amazing," then "amazing" is a key ingredient.
  • It does this for every single floor of the building, not just the top one. This shows how the meaning of "amazing" evolves as it travels up the layers.

Step 2: The "Who is Talking to Whom" Map (Attention Gradients)

Transformers work by having different parts of the sentence talk to each other. CA-LIG doesn't just look at who is talking; it looks at how much that conversation matters for the final decision.

  • It combines the "What If" scores with the "Who is talking" map.
  • The Analogy: Imagine a courtroom. The "What If" game tells you which witness is important. The "Attention" map tells you which witness the judge is listening to. CA-LIG combines these to tell you exactly which testimony convinced the judge.

Step 3: The Context Filter

This is the secret sauce. CA-LIG understands that words depend on their neighbors.

  • If the robot sees "The movie was not amazing," CA-LIG knows that "not" flips the meaning. It captures this relationship.
  • It creates a map where some words are Green (helping the decision) and some are Red (hurting the decision).

3. What Did They Find?

The researchers tested this new detective on many different tasks:

  • Reading Reviews: When a robot said a movie was bad, CA-LIG showed it wasn't just the word "bad" that decided it. It was the combination of "lame," "poor acting," and "worst" working together.
  • Detecting Hate Speech: In a low-resource language (Amharic), CA-LIG could spot hate speech by seeing how abusive words were connected, even if the sentence structure was complex.
  • Looking at Pictures: They even used it on a robot that looks at images (like cats vs. dogs). Instead of just highlighting the whole cat, CA-LIG highlighted the ears and eyes (the important parts) and ignored the background.

4. Why Does This Matter?

Think of CA-LIG as a transparent window into the robot's mind.

  • Trust: If a doctor uses an AI to diagnose a disease, they need to know why the AI said "cancer." CA-LIG shows the evidence, so the doctor can trust the result.
  • Debugging: If the AI makes a mistake, CA-LIG shows you exactly which floor of the building went wrong, helping engineers fix it.
  • Fairness: It helps us see if the AI is being biased (e.g., judging a person based on their name rather than their words).

The Bottom Line

The authors built a tool that stops treating AI like a magic black box. Instead of just guessing why the robot made a choice, CA-LIG walks us through the robot's thought process, floor by floor, showing us exactly how it connected the dots. It's a huge step toward making AI not just smart, but also honest and understandable.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →