Application of deep learning and explainable AI-supported medical decision-making for facial phenotyping in genetic syndromes

This study found that while both AI predictions and explainable AI (XAI) saliency maps improved diagnostic accuracy when the AI was correct, medical geneticists relied more heavily on the raw prediction probabilities than on XAI explanations, which were viewed less favorably and failed to significantly enhance decision-making integration.

Sumer, O., Huber, T., Cheng, J., Duong, D., Ledgister Hanchard, S. E., Conati, C., Andre, E., Solomon, B. D., Waikel, R. L.

Published 2026-03-12
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a master detective trying to solve a very tricky case: identifying a rare genetic condition just by looking at a person's face. These conditions are like rare fingerprints; they have specific patterns (like the shape of the nose, eyes, or mouth) that only a trained expert usually notices.

Now, imagine you have a super-smart robot assistant (Artificial Intelligence) that can also look at these faces and guess the diagnosis. But here's the catch: sometimes the robot is right, and sometimes it's confidently wrong.

This paper is about a big experiment to see if giving the human detectives two different types of help from the robot makes them better at solving the case.

The Two Types of Help

The researchers tested two groups of medical experts (geneticists):

  1. The "Scoreboard" Group (AI-Only): These detectives were shown the face and the robot's guess, along with a confidence score (e.g., "I am 90% sure this is Syndrome A"). It's like the robot whispering, "I think it's this, and I'm pretty sure."
  2. The "Highlighter" Group (XAI-Supported): These detectives got the same score, plus a visual "highlighter" (called a Saliency Map). This map glows on the parts of the face the robot thinks are important. It's like the robot pointing a laser pointer at the nose and saying, "Look here! The nose shape is what made me think it's Syndrome A." They also got a simple chart summarizing which features mattered most.

The Experiment: What Happened?

The researchers showed 18 different faces to 44 experts. Half the time, the robot was right. Half the time, the robot was wrong.

1. When the Robot was Right:
Both groups got a little boost. Seeing the robot's guess made the experts more confident and slightly more likely to agree with the correct answer. It was like having a co-pilot confirm your navigation; you feel safer and stick to the right path.

2. When the Robot was Wrong:
This is where things got interesting.

  • The Scoreboard Group: When the robot said, "I'm 90% sure it's Syndrome A" (but it was actually Syndrome B), the experts often got confused and changed their correct answer to the wrong one. They trusted the robot's confidence too much.
  • The Highlighter Group: The "Highlighter" group was also confused by the wrong robot, but the glowing map didn't help them fix the mistake. In fact, the map sometimes made them more critical of the robot, but since they couldn't go ask a patient for more info during the test, they were stuck.

The Big Surprise: The "Highlighter" Didn't Work

The researchers expected that seeing where the robot was looking (the highlighter) would help the experts understand the robot better and make smarter decisions. They thought it would be like a teacher showing you the steps to solve a math problem.

But it didn't work that way.

  • The experts generally found the "highlighter" maps confusing or unhelpful.
  • They didn't trust the glowing spots. Sometimes the robot highlighted the wrong part of the face, or the experts didn't know what the highlighted spot meant.
  • The experts relied much more on the confidence score (the "Scoreboard") than on the visual explanation. If the robot said "90% sure," they listened. If the robot showed a map, they mostly ignored it or found it distracting.

The Takeaway: Why This Matters

Think of it like a GPS in your car.

  • AI-Only is the GPS saying, "Turn left in 500 feet."
  • XAI (The Highlighter) is the GPS trying to explain why it wants you to turn left by showing a complex map of traffic patterns and road construction.

The study found that when the GPS is right, the simple instruction is enough. But when the GPS is wrong, showing you the complex map doesn't help you realize it's wrong; in fact, it might just make you doubt yourself or get frustrated.

The Main Lesson:
Giving doctors "explanations" (like heat maps) isn't automatically helpful. In fact, if the explanation is hard to understand or doesn't match what the doctor expects, it can actually get in the way. The doctors trusted the robot's "gut feeling" (the probability score) more than its "reasoning" (the visual map).

What's Next?
The researchers conclude that we need to build better ways to explain AI. Instead of just showing a glowing map, we might need to explain it in plain language (e.g., "The robot thinks this is Syndrome A because the eyes are wide-set, which is a key feature"). Until then, doctors should be careful not to blindly trust the robot, even when it gives a confidence score.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →