SCAN: Visual Explanations with Self-Confidence and Analysis Networks

This paper introduces SCAN, a universal framework that leverages an AutoEncoder-based approach guided by the Information Bottleneck principle to generate high-resolution, faithful self-confidence maps, effectively overcoming the trade-off between fidelity and applicability in visual explanations for both CNN and Transformer architectures.

Gwanghee Lee, Sungyoon Jeong, Kyoungson Jhang

Published 2026-03-09
📖 4 min read☕ Coffee break read

Imagine you have a brilliant but incredibly shy chef (the AI model) who can cook a perfect dish (make a prediction) but refuses to tell you why they chose those specific ingredients. You ask, "Why did you put salt in this soup?" and the chef just points vaguely at the whole kitchen.

For a long time, the tools we used to ask the chef these questions were either too vague or too specific:

  • The "Universal" Tools: These were like asking a random bystander to guess what the chef did. They work on any kitchen, but their guesses are often wrong or too fuzzy.
  • The "Specialized" Tools: These were like hiring a sous-chef who only knows how to work with one specific type of stove. They give great answers, but if you switch to a different stove (a different AI model), they are useless.

Enter SCAN (Self-Confidence and Analysis Networks).

The authors of this paper built a new, universal translator that works in any kitchen, whether it's a modern smart kitchen (Transformers) or a classic brick oven (CNNs). Here is how it works, using simple analogies:

1. The "Reconstruction" Game (The Core Idea)

Imagine you take a photo of the soup the chef made, but you crush it into a tiny, blurry puzzle piece (this is what happens inside the AI's brain).

  • Old methods just looked at the puzzle piece and guessed what the soup looked like.
  • SCAN says: "Let's try to rebuild the original photo from that puzzle piece."

They built a special machine (a decoder) that tries to reconstruct the original image from the AI's "thoughts." But here is the trick: The machine only gets good at reconstructing the parts of the image that actually matter for the decision.

2. The "Self-Confidence Map" (The Highlighter)

As the machine tries to rebuild the image, it keeps a scorecard called the Self-Confidence Map.

  • If the machine is 100% confident it can rebuild a specific part of the image (like the chicken in the soup), it highlights that area brightly.
  • If it's confused (like the background table or the steam), it leaves that area dark.

Think of it like a detective using a flashlight in a dark room. The flashlight only shines brightly on the clues that solve the case. Everything else remains in the shadows. SCAN's flashlight is so good it ignores the dust bunnies (background noise) and shines only on the suspect (the object).

3. The "Information Bottleneck" (The Filter)

The paper uses a concept called the Information Bottleneck. Imagine a crowded hallway where everyone is shouting.

  • Old methods let everyone shout, so you hear a lot of noise.
  • SCAN puts a bouncer at the door. The bouncer only lets through the people who are shouting the most important words (the features that actually help the AI decide).
  • By filtering out the noise, the remaining message is crystal clear.

4. The "Gradient Mask" (The Spotlight)

Before the reconstruction starts, SCAN puts a filter over the AI's thoughts. It's like putting a sunglasses filter on a camera.

  • It blocks out the "weak" signals (things the AI isn't sure about).
  • It only lets the "strong" signals (the top 95% of important features) pass through.
  • This ensures the machine doesn't waste time trying to reconstruct irrelevant background details.

Why is this a Big Deal?

  • It's Universal: Whether the AI is a "CNN" (good at spotting edges) or a "Transformer" (good at understanding context), SCAN works on both. It's like a universal remote control that works on every TV brand.
  • It's Honest: The paper tested this by "breaking" the AI (randomizing its brain). When the AI was broken, SCAN stopped working. This proves SCAN isn't just guessing; it's actually reading the AI's mind.
  • It's Clear: Other methods often produce "fuzzy blobs" that cover the whole picture. SCAN produces sharp, clean outlines of the actual object, like a high-definition silhouette.

The Bottom Line

SCAN is a new tool that helps us understand why AI makes decisions. It does this by trying to rebuild the image from the AI's internal thoughts and highlighting only the parts it is "confident" about. It bridges the gap between tools that are too general and tools that are too specific, giving us a clear, reliable window into the "black box" of artificial intelligence.

In short: SCAN turns the AI's mumbled thoughts into a clear, highlighted map of exactly what it was looking at.