Towards Attributions of Input Variables in a Coalition

This paper addresses the challenge of partitioning input variables in Shapley value-based Explainable AI by analyzing attribution conflicts caused by AND-OR interactions, proposing a new attribution metric for variable coalitions and three faithfulness evaluation metrics that are validated across diverse domains.

Xinhao Zheng, Huiqi Deng, Quanshi Zhang

Published 2026-02-25
📖 5 min read🧠 Deep dive

Imagine you are trying to figure out why a specific team won a soccer match.

You have a list of players: Striker, Midfielder, Defender, and Goalkeeper.

  • Old Method: You ask, "How much did the Striker contribute?" Then you ask, "How much did the Midfielder contribute?" You add them up, and you get a total score.
  • The Problem: What if the Striker and Midfielder are best friends who always pass the ball to each other? If you look at them separately, you might miss the magic that happens when they work together. Or worse, if you group them as a "Forward Duo" and ask, "How much did the Duo contribute?", the math might get weird. The "Duo's" score might not equal the sum of the two individuals' scores.

This paper is about fixing that math so we can understand AI models (like the ones that recognize cats in photos or write essays) without getting confused about how to group their "ingredients."

Here is the breakdown in simple terms:

1. The Core Problem: The "Grouping" Confusion

In AI, we want to know which parts of the input (like words in a sentence or pixels in an image) are responsible for the decision.

  • The Conflict: If you look at the word "raining" alone, it might seem important. If you look at "cats" alone, it might seem important. But the phrase "raining cats and dogs" is a specific idiom meaning "heavy rain."
  • The Issue: If you treat "raining," "cats," and "dogs" as separate players, you might miss the joke. But if you group them into a "Coalition" (a team), the math used to calculate their importance often breaks. The score for the group doesn't match the sum of the individuals. It's like saying the whole pie is worth \10, but the slices add up to \15. That doesn't make sense!

2. The Solution: The "Recipe" Analogy

The authors realized that AI models work like a complex recipe. They have two types of interactions:

  • The "AND" Interaction (The Secret Sauce): This happens when everything in a group must be present for a specific effect.
    • Example: To make a "Heavy Rain" prediction, the AI needs "raining" AND "cats" AND "dogs" all at once. If you remove one, the "Heavy Rain" effect disappears.
  • The "OR" Interaction (The Backup Plan): This happens if any one of a group is present.
    • Example: To predict "Negative Mood," the AI might react if it sees "boring" OR "sad" OR "terrible." You only need one of them to trigger the effect.

The Big Discovery:
The authors proved that the confusion (the math conflict) happens because of the "Partial Groups."

  • Imagine a group of friends: Alice, Bob, and Charlie.
  • The AI has a rule: "Alice + Bob + Charlie = Great Party."
  • But it also has a rule: "Alice + Bob = Good Party."
  • If you try to calculate the value of the "Alice + Bob" team, the math gets messy because the AI is counting them in two different ways (once as a pair, once as part of a trio).

3. The New Tool: "Coalition Attribution"

The paper proposes a new way to calculate importance that respects these groups.

  • Old Way: Just add up individual scores. (Result: Confusion).
  • New Way: Look at the "Recipe." If a group of variables (like "raining cats and dogs") acts as a single unit in the AI's brain, the new math gives them a single, fair score. It acknowledges that sometimes, the whole is indeed different from the sum of its parts.

They created three "Trust Scores" to check if a group is a real team or just a random bunch of people standing together:

  1. Is this group a "True Team"? (Do they always work together in the AI's logic?)
  2. Is this group "Fake"? (Did we just randomly grab these words together, even though the AI doesn't see them as a unit?)
  3. Is this group "Mixed"? (Are they sometimes a team, but sometimes just individuals?)

4. Real-World Tests

The authors tested this on:

  • Language: They checked if the AI understood phrases like "raining cats and dogs" as a single unit. It did!
  • Images: They checked if the AI saw a "horse's head" as a single object made of pixels, rather than just random pixels. It did!
  • Go (The Board Game): This was the coolest part. They used their method to explain why a professional Go AI (KataGo) made a move.
    • Human players memorize "shapes" (patterns of stones).
    • The AI found patterns that humans didn't know existed! The new math helped humans understand why the AI liked a specific shape, revealing new strategies that even experts hadn't seen before.

The Takeaway

This paper is like a translator for AI.
Previously, if you asked an AI, "Why did you do that?" and tried to group the reasons, the answer was often mathematically broken.
Now, the authors have given us a new set of glasses. These glasses allow us to see groups of inputs (like phrases, image patches, or game patterns) as legitimate "teams" with their own value, without breaking the math. This helps us trust AI more and even learn new things from it (like new Go strategies).

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →