A Blinded Comparative Evaluation of Clinical and AI-Generated Responses to Otologic Patient Queries

This comparative study found that large language models generated responses to otologic patient queries that were longer, more empathetic, and more readable than those from verified physicians, suggesting their potential to enhance patient-centered communication when appropriately implemented.

Original authors: Akinniyi, S., Jain-Poster, K., Evangelista, E., Yoshikawa, N., Rivero, A.

Published 2026-04-15
📖 4 min read☕ Coffee break read

Original authors: Akinniyi, S., Jain-Poster, K., Evangelista, E., Yoshikawa, N., Rivero, A.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you have a question about your ear—maybe it's ringing, hurting, or you're feeling dizzy. You turn to the internet for answers. In the past, you might have asked a doctor on a public forum like Reddit. Today, you might ask a super-smart computer program (an AI) like ChatGPT.

This study is like a blind taste test to see who gives better answers: the real doctors or the AI robots.

Here is the story of what they found, broken down simply:

The Setup: The "Blind Taste Test"

The researchers took 49 real questions people asked about their ears (like "Why does my ear hurt?" or "I can't hear well").

  • Team Doctor: They looked at the answers real, verified doctors gave on Reddit.
  • Team AI: They asked three different super-computers (ChatGPT, Claude, and Google Gemini) to answer the exact same questions.
  • The Judges: Five experts read all the answers without knowing who wrote them. They scored them on three things: Quality (is it right?), Empathy (does it sound kind?), and Readability (is it easy to understand?).

The Results: The Robot Wins the Popularity Contest

Surprisingly, the AI robots scored higher than the real doctors in almost every category!

  1. The "Kindness" Factor (Empathy):

    • The Doctor: Imagine a busy doctor who has seen 20 patients today. They give you a quick, practical answer: "It sounds like an infection. Go see a specialist." It's accurate, but a bit short and to the point.
    • The AI: Imagine a patient advocate who has all day to talk. The AI said, "I'm so sorry your ear is hurting. That sounds really frustrating. Here is what might be happening, and here is exactly what you should do next."
    • The Verdict: The judges felt the AI was much warmer, more caring, and sounded more like a friend who really listened.
  2. The "Clarity" Factor (Readability):

    • The Doctor: Doctors often use medical jargon or write in a way that assumes you know a little bit about medicine. It's like reading a textbook.
    • The AI: The AI was told to write at a "6th-grade reading level." It explained things simply, like a teacher explaining a concept to a middle schooler.
    • The Verdict: The AI's answers were much easier for the average person to understand.
  3. The "Length" Factor:

    • The doctors were concise (short). The AI was chatty (long).
    • Analogy: The doctor gave you a map. The AI gave you a guided tour with the map, the history of the place, and tips on where to eat. Even though the AI talked more, people actually liked the extra detail.

The Catch: The "Uncanny Valley"

Even though the AI won the scores, there was a twist.

  • Can you tell who is who? The judges were able to guess correctly 89% of the time whether an answer was from a human or a robot.
  • Why? The AI sometimes sounded too perfect or used a specific "robotic" style of empathy that felt a little fake. It was like an actor playing a doctor; they did a great job, but you could still tell they were acting.
  • The Safety Issue: The AI sometimes got a little too scared. If you mentioned a mild earache, the AI might say, "Go to the ER immediately!" (Up-triaging). Real doctors are better at knowing when to say, "It's probably fine, just keep an eye on it."

The Big Picture: What Does This Mean for You?

Think of the AI not as a replacement for your doctor, but as a super-powered assistant.

  • The Problem: Doctors are often overwhelmed with messages from patients. They are tired and short on time.
  • The Solution: Imagine a future where the AI writes the first draft of the answer. It explains the condition clearly, sounds kind, and suggests next steps. Then, the real doctor quickly reads it, fixes any medical errors, and hits "send."

In short: The AI is great at being a friendly, clear, and patient teacher. But it still needs a real doctor to be the final boss who makes sure the advice is safe and accurate. The future of healthcare might be a team-up: the robot's brain for clarity and kindness, and the human doctor's heart and judgment for safety.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →