Why Does It Look There? Structured Explanations for Image Classification

The paper proposes I2X, a framework that transforms unstructured interpretability into structured, prototype-based explanations to reveal model decision-making processes and actively improve classification accuracy through targeted sample perturbation.

Jiarui Li, Zixiang Yin, Samuel J Landry, Zhengming Ding, Ramgopal R. Mettu

Published 2026-03-12
📖 4 min read☕ Coffee break read

Imagine you have a brilliant but silent student taking a test. They get the right answers almost every time, but when you ask, "How did you know that was a cat and not a dog?" they just shrug. They can't explain their thought process. This is the problem with most modern AI: it's a "black box." It works great, but we don't know why it works.

Existing methods try to peek inside by highlighting the parts of a picture the AI looked at (like a high-lighter pen on a photo). But this is messy. It's like seeing a student underline words in a book but not knowing if they underlined them because they were important, or just because they liked the color. It doesn't tell you the story of how the student learned.

This paper introduces a new method called I2X (Interpretability to Explainability). Think of I2X as a detective that interviews the student at every stage of their training to build a structured story of how they learned.

Here is how it works, using simple analogies:

1. The "Lego Brick" Analogy (Prototypes)

Instead of looking at the whole picture at once, I2X breaks the AI's learning down into tiny building blocks called prototypes.

  • Imagine the AI is learning to recognize the number "7".
  • It doesn't just see a "7." It learns to recognize specific "Lego bricks" that make up a 7: a diagonal line in the middle, a dot at the top, a horizontal line at the bottom.
  • I2X identifies these specific patterns (bricks) and names them. Let's call them "Brick A," "Brick B," and "Brick C."

2. The "Training Diary" (Tracking Evolution)

Most AI explanations just look at the finished product. I2X is different; it keeps a diary of the AI's entire training journey.

  • It checks the AI at different checkpoints (like checking a student's progress at the end of every week).
  • It asks: "At week 1, did you use 'Brick A' to guess '7'? Did you get it right?"
  • "At week 4, did you start using 'Brick B'?"
  • By tracking these changes, I2X builds a timeline: "First, the AI learned to spot the diagonal line. Then, it learned to ignore the top dot because that confused it with the number '1'."

3. The "Confused Student" (Finding Uncertainty)

Sometimes, the AI gets confused. Maybe it thinks a "7" looks like a "2" because they both have a curve.

  • I2X spots the specific "Brick" that is causing the confusion. Let's say it's "Brick X," which looks like a curve found in both numbers.
  • The paper shows that if the AI sees "Brick X" too often without clear context, it gets shaky in its decisions. It's like a student who keeps flipping between two answers because one clue fits both.

4. The "Smart Tutor" (Fixing the AI)

This is the coolest part. Once I2X finds the "confusing brick," it doesn't just report it; it helps fix the AI.

  • The researchers took the AI and gave it a special "tutoring session."
  • They showed the AI examples that didn't have the confusing "Brick X," helping it learn to ignore that specific trap.
  • The Result: The AI became much better at telling the difference between the numbers (or cats and dogs in the case of the CIFAR-10 dataset). It reduced its mistakes significantly.

Why This Matters

Think of current AI as a magic trick. You see the rabbit appear, but you don't know how the magician did it.

  • Old methods say: "The magician looked at the rabbit's left ear." (This is the "saliency map" or highlighting).
  • I2X says: "The magician first practiced pulling the rabbit from the left sleeve, then realized the hat was too big, so he switched to the right sleeve, and finally learned to hide the rabbit in the cape. Here is the step-by-step manual."

The Big Takeaway

This paper gives us a way to turn the AI's "black box" into a transparent instruction manual. It shows us exactly how the AI organizes its thoughts, where it gets confused, and how we can gently nudge it to learn better. It's not just about explaining the past; it's about using that explanation to make the AI smarter and more reliable for the future.