Logic Explanation of AI Classifiers by Categorical Explaining Functors

Imagine you have a super-smart but mysterious robot chef (the AI). This chef can cook amazing dishes, but if you ask, "Why did you add salt to this soup?" the chef just stares at you and says, "I just did." It's a "black box."

Explainable AI (XAI) is the field trying to get the chef to talk. Most current methods are like post-hoc detectives. They watch the chef cook, guess what ingredients were important, and then write a story explaining it. The problem? Sometimes the detective's story contradicts the chef's actual actions. The detective might say, "The chef added salt because the soup was bland," but the chef actually added salt because the pot was hot. The story sounds logical, but it's a lie about how the chef thinks.

This paper proposes a new way to build these detectives using a branch of math called Category Theory. Here is the simple breakdown:

1. The Problem: The "Translation" Glitch

The authors point out that AI models think in fuzzy numbers (like 0.2, 0.8, 0.99), but humans understand clear logic (Yes/No, True/False).

The Analogy: Imagine trying to translate a poem written in a fluid, dream-like language (Fuzzy Logic) into a strict, rigid language like Morse code (Boolean Logic).
The Mistake: If you just take a rough guess at the translation, you might end up with a sentence that makes no sense. For example, the AI might say "If it's raining AND it's Tuesday, then I'm happy." But in reality, the AI is happy if it's raining OR it's Tuesday. The "rough guess" explanation is inconsistent. It sounds like logic, but it breaks the rules of the original AI.

2. The Solution: The "Explaining Functor"

The authors introduce a concept called an Explaining Functor. In simple terms, think of a Functor as a perfect translator or a rigid mold.

The Mold Analogy: Imagine you have a blob of clay (the AI's fuzzy reasoning). You want to press it into a cookie cutter (the human logic rule).
- Old methods: You squish the clay in with your hands. Sometimes the clay spills over, or the shape doesn't match the cutter. The resulting cookie looks like a star, but the clay inside is a mess.
- This paper's method: They design a special mold (the Functor) that guarantees that no matter how you press the clay, the shape that comes out perfectly matches the shape of the cutter.
The Magic: This mold ensures that if you explain step-by-step (Layer 1, then Layer 2, then Layer 3), the final explanation is still true to the whole process. It prevents the "detective" from lying about the chef's process.

3. What is "δ-Coherence"?

The paper talks about a special class of AI functions called δ-coherent.

The Analogy: Think of a traffic light.
- Coherent: If the light is Red, it means Stop. If it's Green, it means Go. The rule is consistent.
- Incoherent: Imagine a light that is Red 50% of the time and means "Go," and the other 50% means "Stop." This is confusing and dangerous.
The authors prove that if an AI is "coherent" (like a good traffic light), we can build a perfect mold (Functor) to explain it. If the AI is "incoherent" (like the broken traffic light), the mold breaks.

4. Fixing the Broken Lights (The "Extension")

What if the AI isn't coherent? (Most real-world AIs aren't perfect).
The authors show how to fix the mold to handle broken lights.

The Analogy: If the traffic light is broken, instead of guessing, we add a new sensor (an extra input feature) that tells us, "Hey, this light is acting weird right now."
By adding this small "patch" or "extra feature," we can force the explanation to become consistent again. It's like putting a sticker on a broken machine that says, "When this sticker is on, the rule changes to X." This ensures the explanation remains honest, even if the machine is messy.

5. The Experiment: The "XOR" Test

They tested this on two scenarios:

The Easy Case (XOR): A logic puzzle where the answer is "True" if inputs are different. Their method worked perfectly, creating a 100% accurate explanation.
The Hard Case (Fuzzy OR): A messy, fuzzy logic puzzle. The old methods gave explanations that were only 67% faithful (they lied about 1/3 of the time).
- The Result: When the authors applied their "patched" mold (the extended functor), the explanation jumped to 83.8% faithfulness. They fixed the lies.

The Big Takeaway

Current AI explainers are like bad translators who make up stories that sound good but are factually wrong about how the AI thinks.

This paper builds a mathematical guarantee (a rigid mold) that ensures the explanation is structurally identical to the AI's actual reasoning. It ensures that if you explain the parts, the whole makes sense, and if the AI is messy, we have a systematic way to "patch" the explanation so it doesn't lie to us.

In short: They moved XAI from "guessing what the AI did" to "mathematically proving what the AI did," ensuring the story we tell about the AI is actually true.

Here is a detailed technical summary of the paper "Logic Explanation of AI Classifiers by Categorical Explaining Functors" by Fioravanti et al.

1. Problem Statement

Current Explainable AI (XAI) methods, particularly post-hoc techniques, often fail to guarantee logical consistency and fidelity between the extracted explanations and the underlying model's reasoning.

The Inconsistency Issue: When extracting discrete logic rules from continuous neural networks (e.g., via thresholding), the resulting Boolean rules may contradict the model's actual behavior. For example, a single rule might explain two different class predictions for inputs that map to the same discrete state, leading to unfaithful explanations.
The Compositionality Gap: Deep learning models are composed of multiple layers. Existing methods often fail to ensure that explanations derived from individual layers compose correctly to form a consistent explanation for the entire network. The current heuristic approaches do not respect functional composition, meaning $Explanation(A \circ B) \neq Explanation(A) \circ Explanation(B)$ .

2. Methodology: Categorical Explaining Functors

The authors propose a mathematically rigorous framework based on Category Theory to bridge the gap between continuous fuzzy functions (neural networks) and discrete Boolean logic (explanations).

Core Concepts

Categories Defined:
- $\mathcal{F}$ (Fuzzy Functions): Objects are $[0, 1]^n$ ; morphisms are continuous functions.
- $\mathcal{B}$ (Boolean Functions): Objects are $\{0, 1\}^n$ ; morphisms are Boolean functions.
$\delta$ -Coherence: The authors introduce a specific class of fuzzy functions called $\delta$ -coherent functions ( $\delta$ -COH). A function $f$ is $\delta$ -coherent if applying a projection $\delta$ (e.g., thresholding) before or after the function yields the same result: $\delta(f(x)) = \delta(f(\delta(x)))$ . This ensures that the Boolean abstraction is consistent with the continuous reality.
The Explaining Functor ( $F_\delta$ ):
- They define a functor $F_\delta: \mathcal{C}_\delta \to \mathcal{B}$ , where $\mathcal{C}_\delta$ is the category of $\delta$ -coherent functions.
- This functor maps a continuous function to its Boolean explanation ( $\delta \circ f$ ) while preserving compositionality. If $f$ and $g$ are coherent, the explanation of their composition is the composition of their explanations.

Extending to Non-Coherent Functions

Since most real-world neural networks are not naturally $\delta$ -coherent, the authors propose a method to extend the framework:

Equivalence Relation: They define an equivalence relation $\equiv_\Gamma$ based on a $\delta$ -coherency function $\Gamma$ , which maps any fuzzy function to a unique $\delta$ -coherent representative.
Quotient Category: They construct a new category, $\mathcal{C}_{(\delta, \Gamma)}$ , where morphisms are equivalence classes of fuzzy functions.
Composite Functor: By composing the functor $F_\Gamma$ (mapping to the coherent representative) and $F_\delta$ (mapping to Boolean logic), they create a unified functor $F_{(\delta, \Gamma)}$ that can explain any fuzzy function while maintaining structural consistency.
Correction Strategies: Two methods are proposed to make non-coherent functions coherent:
- Domain Extension: Adding auxiliary inputs to disambiguate points where coherence fails.
- Output Modification: Adjusting the output values of the function at points where coherence fails to match a coherent target.

3. Key Contributions

Theoretical Foundation: Identification of categories of functions whose Boolean explanations are inherently consistent and combinable by design.
Defining Explaining Functors: Formal definition of functors that associate logic formulas to concept-based fuzzy functions, ensuring that logical entailment is structurally preserved.
Handling Non-Coherence: A rigorous mathematical construction (using quotient categories and equivalence relations) to extend explaining functors to non-coherent functions without breaking compositional properties.
Proof of Concept: Experimental validation showing that the theoretical framework mitigates the generation of contradictory explanations.

4. Experimental Results

The authors validated their approach using Logic Explained Networks (LENs) on two synthetic benchmarks with $\alpha$ -booleanization ( $\alpha=0.5$ ):

Experiment 1: XOR Function (Naturally Coherent)
- The target function was naturally $\delta$ -coherent.
- Results: The model achieved ~95.5% test accuracy and 94.8% fidelity. The generated First-Order Logic (FOL) explanations were perfectly consistent with the model's behavior.
Experiment 2: Fuzzy OR (Łukasiewicz t-conorm, Non-Coherent)
- The target function is inherently non- $\delta$ -coherent in certain regions.
- Standard Approach: High accuracy (88.4%) but low fidelity (67.1%). The extracted rules were contradictory (e.g., explaining a class 1 prediction with a rule that should imply class 0).
- Proposed Approach (Extended Functor): By applying the domain extension/output modification (Theorems 3 & 4), the authors created a modified explainer $\hat{f}^{(2)}$ .
- Results: Fidelity improved significantly to 83.8%. The new explanations included an additional feature ( $nc$ ) to handle non-coherent regions, resulting in logically consistent rules that accurately reflected the model's behavior.

5. Significance and Impact

Mathematical Rigor in XAI: This work moves XAI beyond heuristic approximations by providing a category-theoretic proof that explanations can be structurally consistent with the model's internal composition.
Solving the "Black Box" Composition Problem: It addresses the critical issue that explaining parts of a deep network does not automatically explain the whole. The functorial approach guarantees that the "whole" explanation is the composition of the "parts."
Practical Reliability: The experiments demonstrate that without this theoretical grounding, post-hoc explanations can be misleading (low fidelity). The proposed method ensures that explanations are not just human-readable but are logically sound and faithful to the model's actual decision boundary.
Future Directions: The framework opens pathways for unifying different XAI methods (e.g., LIME, saliency maps) under a categorical umbrella and applying these principles to complex, sub-symbolic data like images.