Here is an explanation of the paper, translated from academic jargon into a story about detectives, maps, and a very tricky game of "Guess the Rule."
The Big Problem: The "Black Box" Detective
Imagine you hire a super-smart detective (a Machine Learning Model) to solve a mystery. Your detective is amazing at predicting the outcome: "Will this customer buy the product?" or "Will this loan default?" They get it right 95% of the time.
But there's a catch: The detective is a Black Box. They give you the answer, but they won't tell you why. They just say, "I know it's true because I know it."
Business researchers are desperate to know why. They want to know: "Does age matter? Does income matter? Is it the weather?"
So, they hire a second detective, called a Post-Hoc Explainer (like SHAP or LIME). This second detective's job is to look at the first detective's work and say, "Ah, I think the first detective was looking at Income to make that decision."
The Mistake:
The paper argues that researchers are making a huge error. They are taking the second detective's guess and treating it as absolute truth about how the world works. They say, "The second detective says Income matters, therefore Income definitely causes people to buy things."
The authors say: Stop! The second detective is just guessing what the first detective was thinking. They aren't necessarily telling you the truth about the real world.
The Analogy: The "Many Paths" Mountain
To understand why this is dangerous, imagine a mountain with a peak (the correct answer).
The Rashomon Effect (The Many Paths):
Imagine there are 100 different paths up the mountain. All 100 paths lead to the exact same peak. You can't tell which path is the "real" one just by looking at the top; they all look perfect.- Path A goes through the forest.
- Path B goes along the river.
- Path C goes over the rocks.
- The Result: All paths get you to the top (high accuracy). But the scenery (the features used) is totally different.
The Explainer's Job:
You ask a guide (the Explainer) to describe the path.- If you ask about Path A, the guide says, "We walked through the forest!"
- If you ask about Path B, the guide says, "We walked by the river!"
- Both guides are telling the truth about the path they were on. But neither guide is telling you the only way to the top.
The Danger:
Researchers pick Path A, ask the guide, and then conclude: "The only way to the top is through the forest!" They ignore the fact that Path B and Path C also work. They mistake the path taken by one specific model for the laws of nature.
What the Authors Did (The Experiment)
The authors (Tong Wang, Ronilo Ragodos, Lu Feng, and Yu Jeffrey Hu) decided to test this. They created a "Ground Truth" simulation.
- The Setup: They built a fake world where they knew the exact rules (e.g., "Income always increases the chance of buying, but only if Age is over 30").
- The Test: They let AI models learn these rules. Then, they asked SHAP and LIME to explain what the models learned.
- The Result:
- On Average: The explainers were okay. They usually got the general idea right.
- In Reality (The Long Tail): When they looked closely, the explainers were often wrong. Sometimes they said "Age" was the most important factor when it wasn't. Sometimes they said "Income" had a positive effect when it actually had a negative one.
- The Shock: Even when the AI model was 99% accurate at predicting the outcome, the explanation it gave was sometimes completely flipped or wrong. High accuracy does not guarantee a truthful explanation.
Why Does This Happen?
The authors found three main reasons why the "guide" gets confused:
- The Rashomon Effect (Too many paths): As mentioned, if many different models can predict the outcome equally well, they might all use different "clues." One model might rely on "Income," another on "Education." If you pick one model at random, its explanation is just one random guess among many.
- Correlated Features (The Twin Problem): Imagine "Income" and "Education" are twins; they always go up and down together. The AI can't tell which one is actually doing the work. It might pick "Income" for one model and "Education" for another. The explainer will point to one, but it's just a coin flip.
- Complexity: If the real world is messy (non-linear, with lots of interactions), it's harder for the AI to find the one true rule, so it finds a "good enough" rule that looks right but isn't the real story.
The Solution: How to Check if You're Being Lied To
The authors don't say "Stop using AI." They say, "Stop trusting the AI's explanation as the final truth."
Instead, they propose a Reliability Check:
- The "Group Consensus" Test:
Don't just ask one detective. Ask 10 different detectives who all got the same score on the test.- Scenario A: All 10 detectives say, "We all looked at Income." -> Good! You can trust this. The "Rashomon Agreement" is high.
- Scenario B: 5 detectives say "Income," 3 say "Education," and 2 say "Age." -> Bad! The "Rashomon Agreement" is low. This means the data is ambiguous. You cannot trust any single explanation.
The Bottom Line: What Should Business Researchers Do?
The paper gives a simple rule of thumb:
- Don't use Explainers for Validation: Do not use SHAP or LIME to prove a hypothesis (e.g., "We proved that X causes Y"). They are not rigorous enough for that.
- Do use Explainers for Exploration: Use them to generate ideas. "Hey, the AI thinks Income is important. That's an interesting idea! Let's go run a proper scientific experiment to see if that's actually true."
In short: Treat Post-Hoc explainers like a weather forecast, not a law of physics. A forecast might be right 90% of the time, but if you bet your life on it, you might get wet. Use them to get hints, but always double-check with real-world evidence.