Enhancing SHAP Explainability for Diagnostic and Prognostic ML Models in Alzheimer Disease

This paper proposes and validates a multi-level explainability framework demonstrating that SHAP explanations for Alzheimer's disease diagnostic and prognostic models are robust, stable, and consistent across different disease stages and prediction tasks, thereby enhancing their reliability for clinical adoption.

Pablo Guillén, Enrique Frias-Martinez

Published 2026-03-10
📖 5 min read🧠 Deep dive

Imagine you have a very smart, super-fast robot doctor that can look at a patient's medical records and tell you two things:

  1. Diagnosis: "Does this person have Alzheimer's right now?"
  2. Prognosis: "Will this person get worse in the next four years?"

This robot is incredibly accurate, but it has a problem: it's a "black box." It gives you the answer, but it won't tell you why it thinks that. It's like a friend who says, "I know you're going to win the lottery," but refuses to explain how they know. Doctors can't trust a tool they don't understand, especially when it comes to life-and-death decisions.

This paper is about teaching that robot doctor to speak human and proving that its reasoning is reliable, not just lucky.

Here is the breakdown of their work using simple analogies:

1. The Problem: The "Magic 8-Ball" Effect

The researchers used a tool called SHAP (think of it as a magnifying glass) to see which clues the robot was using. Usually, the robot points to things like "Memory Test Scores" or "Ability to Pay Bills" as the main reasons for its decision.

But there was a catch:

  • The "One-Off" Problem: If you asked the robot to diagnose a patient, it might say, "It's because of Memory." But if you asked it to predict the future, would it still say, "It's because of Memory"? Or would it suddenly switch to "It's because of Genetics"?
  • The "Fickle Friend" Problem: If the robot changes its mind about why it's making a prediction every time you tweak the data slightly, doctors can't trust it. They need to know the robot is consistent.

2. The Solution: The "Three-Point Stability Test"

The authors created a new framework to test if the robot's explanations are sturdy. They used three creative tests:

  • Test A: The "Internal Logic" Check (Coherence)

    • Analogy: Imagine a detective who solves a crime. The detective's notebook (Feature Importance) says "The Butler did it." But when the detective explains it to the jury (SHAP), they say, "It was the Maid."
    • The Test: The researchers checked if the robot's internal "notebook" matched its "explanation." They found that for the most part, the robot was honest: what it used to learn was the same thing it used to explain.
  • Test B: The "Same Story, Different Chapter" Check (Stability)

    • Analogy: Imagine reading a mystery novel. In Chapter 1 (Early Stage), the clues point to the Butler. In Chapter 10 (Late Stage), do the clues still point to the Butler, or does the story suddenly change to a completely different suspect?
    • The Test: They checked if the robot used the same "clues" (like memory or attention) whether the patient was in the early stages of confusion or the late stages of dementia.
    • Result: The robot was very consistent! It kept pointing to the same cognitive clues (Memory, Judgment, Attention) regardless of how far the disease had progressed.
  • Test C: The "Past vs. Future" Check (Transferability)

    • Analogy: A weather forecaster who says, "It's raining now because of dark clouds." If you ask, "Will it rain tomorrow?" a good forecaster should still say, "Yes, because of those same dark clouds," not suddenly say, "No, because of the wind."
    • The Test: They compared the robot's reasons for diagnosing the disease today versus predicting the disease four years from now.
    • Result: The reasons were almost identical. The robot didn't suddenly start relying on weird, random factors when looking into the future. It stuck to the core symptoms.

3. The Big Discovery: What Actually Matters?

Through this testing, the researchers confirmed what doctors have suspected for a long time, but now with mathematical proof:

  • The Real Heroes: The most important clues are Cognitive and Functional skills. Can the patient remember things? Can they pay their bills? Can they navigate a room?
  • The Sidekicks: Genetic markers (like DNA) and administrative details (like which language the test was taken in) played a much smaller role. They were there, but they weren't the main characters driving the decision.

4. Why This Matters for You

Think of this framework as a "Trust Seal" for AI in medicine.

Before this paper, a doctor might look at an AI result and think, "It says my patient has Alzheimer's, but I don't know if the AI is just guessing or if it's actually looking at the right symptoms."

Now, because the researchers proved that the AI's reasoning is stable (it doesn't flip-flop), coherent (its explanation matches its logic), and transferable (it works for both today and the future), doctors can finally say:

"I trust this AI. It's looking at the same real-world symptoms I am, and it's consistent. I can use this tool to help my patients."

In a Nutshell

The paper didn't just build a better robot; it built a better translator for the robot. It proved that the robot's "reasoning" isn't magic or luck—it's based on solid, consistent medical facts that doctors can understand and trust. This is a huge step toward getting AI into real hospitals to help fight Alzheimer's.