Beyond Attribution: Unified Concept-Level Explanations

Imagine you've built a super-smart robot chef (the AI model) that can tell you if a movie is good or bad, or if a photo contains a dog or a cat. But there's a problem: the robot is a black box. You can't see how it thinks. It just gives you an answer.

To fix this, we usually ask the robot, "Why did you say that?" and it gives us a list of reasons. But often, these reasons are confusing.

The Old Way (Feature-Level): It might say, "I liked the movie because pixels 45, 46, and 47 were bright red, and pixel 102 was slightly blue." That's like trying to understand a painting by looking at individual drops of paint. It's technically true, but it doesn't help you understand the story.
The New Way (Concept-Level): We want the robot to say, "I liked the movie because the plot was exciting and the acting was realistic." These are concepts—ideas humans actually understand.

The Problem with Current "Concept" Explainers

Scientists have tried to make robots talk in concepts before, but they hit a wall.

They only do one thing: Most existing tools can only give you a "score" for each concept (like a report card: "Plot: 8/10").
They miss the big picture: They can't answer other important questions like:
- "What is the minimum thing I need to change to get a different result?" (Counterfactuals)
- "What conditions guarantee this result?" (Sufficient Conditions)

It's like having a GPS that only tells you how much each turn contributed to your trip, but refuses to tell you, "If you had turned left instead of right, you would have arrived at the beach."

The Solution: UnCLE (Unified Concept-Level Explanations)

The authors of this paper created a tool called UnCLE. Think of UnCLE as a universal translator and upgrade kit for AI explainers.

Here is how it works, using a simple analogy:

1. The "Magic Translator" (Large Pre-trained Models)

Imagine you have a very old, rigid robot (the AI you want to explain) that only speaks "Pixel" or "Word." You want it to speak "Concept."
UnCLE uses a Large Pre-trained Model (like a super-smart AI assistant, e.g., GPT or a diffusion model) as a translator.

The Job: When the old robot needs to test a "what if" scenario (e.g., "What if the movie had a boring plot?"), the old robot can't just delete a word. It needs a whole new sentence.
The Magic: UnCLE asks the Magic Translator: "Please write a new sentence that is exactly like the original, but remove the concept of 'boring plot' and keep everything else the same."
The Magic Translator does this instantly, creating a new, realistic sample for the old robot to analyze.

2. The "Universal Adapter"

The best part about UnCLE is that it doesn't require building a new robot from scratch. It takes existing explanation tools (like LIME, Anchors, or SHAP) and plugs them into this new system.

Before: LIME looks at an image and says, "These 50 tiny pixels made the robot think it's a 'Punching Bag'."
After UnCLE: LIME looks at the image, asks the Magic Translator to swap out the "Punching Bag" for a "Sofa," and then says, "The robot thought it was a 'Punching Bag' because of the Punching Bag concept. If we swap it for a Sofa, the robot changes its mind."

What UnCLE Can Do Now

Because of this upgrade, UnCLE can give you three different types of answers, all based on human concepts:

Attributions (The Scorecard): "The movie was rated 'Good' because the Visual Effects were great (High Score) and the Pacing was slow (Low Score)."
Sufficient Conditions (The Guarantee): "As long as the movie has Good Acting and Good Sound, the robot will always rate it 'Good', no matter what else happens."
Counterfactuals (The "What If"): "If the movie had Better Pacing, the robot would have rated it 'Good' instead of 'Bad'."

Why This Matters

It's Faithful: The explanations actually match how the AI thinks, not just a guess.
It's Flexible: You can ask for the type of answer you need (a score, a rule, or a "what if").
It's Universal: It works on text (movies, news), images (cats, cars), and even mixed media.

The Bottom Line

UnCLE is like taking a complex, confusing instruction manual for a machine and rewriting it in plain English. It doesn't just tell you which parts of the machine are working; it tells you what the machine is thinking in a language you can actually use to make decisions. It bridges the gap between "AI logic" and "Human understanding" without needing to rebuild the AI from the ground up.

1. Problem Statement

The field of Explainable AI (XAI) faces a critical gap between model-agnostic explanation techniques and concept-based approaches:

Model-Agnostic Methods: Techniques like LIME, SHAP, and Anchors are versatile and work across different architectures (including black-box models) but typically operate at the feature level (e.g., superpixels in images or individual words in text). These low-level features often lack semantic meaning, making explanations difficult for end-users to interpret.
Concept-Based Methods: These methods use high-level semantic concepts (e.g., "objects," "sentiment topics") which are more faithful and understandable. However, existing concept-based methods are largely limited to attribution (importance scores) and struggle to provide other crucial explanation forms like sufficient conditions (rules that guarantee an outcome) or counterfactuals (how to change an input to alter the outcome). Furthermore, many concept-based methods are not truly model-agnostic or require complex, task-specific retraining.

The Core Challenge: How to elevate existing local model-agnostic methods from the feature level to the concept level to provide unified, faithful, and diverse explanation forms (attributions, sufficient conditions, and counterfactuals) without redesigning the core algorithms.

2. Methodology: The UnCLE Framework

The authors propose UnCLE (Unified Concept-Level Explanations), a general and lightweight framework that augments existing local model-agnostic methods. It operates in three distinct steps:

A. Concept-Level Predicate Producing

Instead of generating predicates based on raw features (e.g., "pixel at x,y is red"), UnCLE extracts high-level concepts from the input data using a Concept-Extracting Model (e.g., SAM for images, LLMs for text).

It defines Concept Predicates ( $P_c$ ): Binary functions indicating whether an input satisfies a specific concept (e.g., "The image contains a child" or "The text mentions 'exciting plot'").
This replaces the standard feature predicate set $P$ with a concept predicate set $P_c$ .

B. Concept-Level Perturbation

This is the novel core of UnCLE. Traditional methods perturb features by masking or adding noise, which often results in nonsensical data (e.g., a blurry image or gibberish text).

Mechanism: UnCLE uses Large Pre-trained Models (LLMs/LDMs) as a Concept-Feature Mapping Model.
Process:
1. Generate a binary vector representing a desired combination of concepts (e.g., "Has concept A, does not have concept B").
2. Prompt the large pre-trained model to generate a new sample in the original input space (image or text) that strictly adheres to these concept constraints.
3. If the vector bit is 1, the model ensures the concept is present; if 0, it ensures the concept is absent.
Advantage: This ensures that perturbed samples remain semantically valid and realistic, unlike simple feature masking.

C. Explanation Generation

UnCLE feeds the concept-perturbed samples and their corresponding model outputs into the original learning algorithms of the base method (e.g., Linear Regression for LIME, Decision Trees for LORE, KL-LUCB for Anchors).

The algorithm learns a local surrogate model based on concepts rather than features.
Because the underlying algorithm remains unchanged, UnCLE inherits the ability to generate multiple explanation forms:
- Attributions: Importance weights for concepts.
- Sufficient Conditions: Minimal sets of concepts that guarantee the model's prediction.
- Counterfactuals: Minimal concept changes required to flip the prediction.

3. Key Contributions

Unified Framework: Introduced UnCLE, a framework that elevates existing feature-level model-agnostic methods (LIME, Kernel SHAP, Anchors, LORE) to the concept level with minimal effort.
Large Model Perturbation: Proposed using large pre-trained models to perform concept-level perturbations, solving the problem of generating semantically meaningful counterfactuals and sufficient conditions.
Unified Explanation Forms: Demonstrated that a single framework can provide attributions, sufficient conditions, and counterfactuals simultaneously, satisfying diverse user needs.
State-of-the-Art Performance: Showed that UnCLE outperforms both feature-level baselines and specialized concept-based methods in terms of fidelity.

4. Experimental Results

The authors evaluated UnCLE on text, image, and multimodal models (BERT, DeepSeek-V3, YOLOv8, ViT, ResNet-50, Qwen2.5-VL).

Perturbation Fidelity: Using Large Language Models (DeepSeek-V3) and Latent Diffusion models for perturbation, UnCLE achieved an average accuracy of 96.8% in generating samples that strictly satisfied the target concept constraints.
Fidelity Improvement:
- Compared to feature-level versions, UnCLE improved Coverage by ~11% and Precision by ~13% for rule-based methods (Anchors/LORE).
- For attribution methods (LIME/SHAP), it improved AOPC (Area Over Perturbation Curve) significantly and reduced Accuracya (error rate after deletion), indicating higher fidelity.
- On average, UnCLE improved the fidelity of existing methods by 56.8%.
Comparison with SOTA: UnCLE-augmented methods outperformed specialized concept-based methods (TBM, LACOAT for text; EAC, ConceptLIME for image) across all tasks.
Human Evaluation: In a user study involving 18 participants, UnCLE-augmented explanations (specifically sufficient conditions and counterfactuals) helped users predict model behavior on unseen data with 8.1% higher precision (for sufficient conditions) and 14.2% higher precision (for counterfactuals) compared to baseline concept-based attribution methods.
Efficiency: While UnCLE introduces computational overhead due to generative model calls, the runtime is practically acceptable, and it outperforms baselines even under matched computational budgets as the budget increases.

5. Significance

Bridging the Gap: UnCLE successfully bridges the gap between the flexibility of model-agnostic methods and the interpretability of concept-based methods.
Democratization of XAI: It allows users to upgrade existing, well-understood explanation tools (like LIME) to concept-level without needing to design new algorithms from scratch.
Versatility: By supporting multiple explanation forms (attributions, rules, counterfactuals) within a unified framework, it addresses the limitation of current methods that often only provide one type of insight.
Practical Impact: The ability to generate semantically valid counterfactuals and sufficient conditions at the concept level provides actionable insights for decision-making in real-world applications, moving XAI beyond simple "feature importance" heatmaps.

Beyond Attribution: Unified Concept-Level Explanations

The Problem with Current "Concept" Explainers

The Solution: UnCLE (Unified Concept-Level Explanations)

1. The "Magic Translator" (Large Pre-trained Models)

2. The "Universal Adapter"

What UnCLE Can Do Now

Why This Matters

The Bottom Line

1. Problem Statement

2. Methodology: The UnCLE Framework

A. Concept-Level Predicate Producing

B. Concept-Level Perturbation

C. Explanation Generation

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank