Anatomical Accuracy of Generative AI for Congenital Heart Disease Illustrations: Gemini NanoBanana Versus ChatGPT Models in a Blinded Comparative Study

In a blinded comparative study of congenital heart disease illustrations, human-modified images demonstrated superior anatomical accuracy and educational suitability compared to generative AI models, with Gemini NanoBanana outperforming ChatGPT systems yet still falling significantly short of expert-designed standards.

Alhuzaimi, A., Alkanhal, A., Alruwaili, A. R. S., Alharbi, N. S., Alfares, F., Aldekhyyel, R. N., Binkheder, S., Temsah, A., Aljamaan, F., Shahzad, M., Albriek, A. Z., Alanazi, F. I., Alhindi, D. A., Al-khatib, S. M., Darweesh, A. A., Altamimi, I., Jamal, A., Saad, K., Alhasan, K., Al-Eyadhy, A., Malki, K. H., Temsah, M.-H.

Published 2026-02-23
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to teach a class of students how to fix a very complicated, custom-made watch. You need a picture of the watch's gears to show them where the springs go.

In the past, you would hire a master watchmaker (an expert) to draw the picture perfectly. But now, you have a new, super-fast robot artist (Generative AI) that can draw a picture of a watch in seconds for free.

The big question: Can you trust the robot's drawing to teach your students, or will it look pretty but get the gears wrong?

This study is exactly that, but instead of watches, the "gears" are the human heart, specifically hearts born with defects (Congenital Heart Disease). The researchers asked: Can AI draw accurate medical pictures of these complex hearts?

Here is the breakdown of what they found, using simple analogies:

1. The Contestants

The researchers set up a "blind taste test" (like a mystery food challenge). They gathered 20 doctors (some heart experts, some general doctors) and showed them pictures of 20 different heart conditions. The doctors didn't know who drew which picture. The pictures came from three sources:

  • The Human Expert: A picture drawn by a doctor and then tweaked by an AI to look like a drawing. (The "Gold Standard").
  • Gemini NanoBanana: A Google AI model.
  • ChatGPT (versions 5 and Images): An OpenAI model.

2. The Results: The "Pretty but Wrong" Problem

The Human Expert (The Gold Standard):
Think of this as a master architect's blueprint. It was the most accurate. About 48% of the time, the doctors said, "Yes, this is exactly right." It was the only one they felt comfortable using in a classroom without changing anything.

Gemini NanoBanana (The "Good Student"):
This AI was the runner-up. It was better than the others but still made mistakes.

  • Accuracy: Only about 23% of its drawings were correct.
  • The Vibe: Interestingly, the doctors thought these pictures were the most beautiful and attractive. It's like a student who draws a stunning, colorful picture of a car, but the wheels are on the roof and the engine is in the trunk. It looks cool, but it doesn't work.
  • Verdict: You could use it if you fix the mistakes first.

ChatGPT (The "Confident Hallucinator"):
This was the biggest disappointment.

  • Accuracy: Only about 3% of the drawings were correct.
  • The Problem: In 86% of cases, the doctors said the picture was "fabricated" (made up). The AI drew hearts with extra chambers, missing valves, or blood vessels going in the wrong direction.
  • The Danger: Because the pictures looked so realistic and confident, a student might believe them and learn the wrong anatomy. It's like a GPS that confidently tells you to drive into a lake because it "thinks" that's the fastest route.

3. The "Label" Trouble

The researchers also checked the text labels inside the pictures (e.g., "Aorta," "Left Ventricle").

  • Human: Labels were correct.
  • Gemini: Labels were okay, but sometimes mixed up.
  • ChatGPT: The labels were mostly nonsense. The AI would point to a valve and label it "The Brain" or put the label for the "Aorta" on the "Pulmonary Artery." It's like a museum guide who points to a painting and says, "This is a famous sculpture."

4. The Expert vs. The Generalist

The study found something interesting about who was judging the pictures:

  • Heart Specialists: They were the strictest critics. They spotted the tiny errors immediately.
  • General Doctors: They were a bit more lenient. Because the AI pictures looked "pretty" and "professional," the general doctors were more likely to think, "Oh, that looks good enough," even if the anatomy was wrong.
  • The Lesson: If you aren't an expert, you might be fooled by a pretty picture that is actually wrong.

5. The Bottom Line (The Takeaway)

The study concludes that AI is a great "draftsman," but a terrible "final artist" for medical education.

  • Don't use AI images directly in class. If you show a student a ChatGPT heart, you are teaching them lies.
  • Use AI as a starting point. You can ask AI to "draw a heart," get a rough sketch, and then have a real doctor fix the errors.
  • Gemini is better than ChatGPT for this specific job, but neither is ready to replace a human medical illustrator yet.

In short: AI is like a very fast, very confident intern who loves to draw. They will hand you a picture in seconds that looks amazing. But if you don't have a senior doctor check their work, they will accidentally teach your students that the heart has a third ear or that blood flows backward. Always have a human expert review the AI's work before showing it to anyone.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →