A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic

Peter Brodeur, Jacob M. Koshy, Anil Palepu, Khaled Saab, Ava Homiar, Roma Ruparel, Charles Wu, Ryutaro Tanno, Joseph Xu, Amy Wang, David Stutz, Hannah M. Ferrera, David Barrett, Lindsey Crowley, Jihyeon Lee, Spencer E. Rittner, Ellery Wulczyn, Selena K. Zhang, Elahe Vedadi, Christine G. Kohn, Kavita Kulkarni, Vinay Kadiyala, Sara Mahdavi, Wendy Du, Jessica Williams, David Feinbloom, Renee Wong, Tao Tu, Petar Sirkovic, Alessio Orlandi, Christopher Semturs, Yun Liu, Juraj Gottweis, Dale R. Webster, Joëlle Barral, Katherine Chou, Pushmeet Kohli, Avinatan Hassidim, Yossi Matias, James Manyika, Rob Fields, Jonathan X. Li, Marc L. Cohen, Vivek Natarajan, Mike Schaekermann, Alan Karthikesalingam, Adam Rodman

Published Tue, 10 Ma

📖 5 min read🧠 Deep dive

View on arXiv ↗PDF ↗

Imagine you have a headache, a strange rash, or just a general feeling of being unwell. In the past, you'd call your doctor's office, maybe wait on hold, and then sit in a waiting room for 45 minutes before finally seeing a doctor for a 15-minute appointment.

This paper is about a new experiment where a super-smart AI chatbot (named AMIE) tried to act like a "medical detective" before you even saw the real doctor. The goal was to see if this AI could safely talk to patients, figure out what might be wrong, and help the real doctor be better prepared when you finally meet.

Here is the story of what happened, explained simply:

1. The Setup: A "Medical Rehearsal"

Think of the clinic as a busy airport. Usually, passengers (patients) rush through security and straight to the gate (the doctor's office), often forgetting to pack the right documents.

In this study, researchers set up a rehearsal room before the flight.

The AI (AMIE): Imagine a very knowledgeable, patient, and polite medical student who never gets tired. This AI talked to 100 real patients via text chat on their computers (not phones) up to 5 days before their real appointment.
The Mission: The AI asked questions like, "When did the pain start?" "Does it hurt when you move?" and "Do you have a fever?" It built a detailed story of the patient's symptoms.
The Safety Net: Crucially, a real human doctor was watching the chat from behind the scenes (like a safety supervisor in a control tower). If the AI said anything dangerous or the patient got upset, the human could hit the "stop" button immediately.

2. The Results: Did the AI Pass the Test?

The Safety Score: 10/10
The most important question was: Did the AI hurt anyone or say something crazy?

Answer: No. The human supervisors watched all 100 conversations and never had to stop the chat. The AI was calm, didn't panic the patients, and didn't suggest dangerous things. It was like a very careful librarian who knows exactly which books to recommend.

The "Detective" Skills: Pretty Good!
After the chat, the AI made a list of possible diagnoses (a "differential diagnosis").

The Result: In 90% of cases, the AI's list included the actual final diagnosis that the real doctor eventually confirmed 8 weeks later.
The Comparison: When experts compared the AI's list of possibilities to the real doctor's list, they were equally good. The AI was just as smart at guessing what was wrong as the human doctor.

The "Action Plan": The AI Needs a Little Help
The AI also suggested what to do next (tests, treatments, etc.).

The Result: The AI's plans were safe and appropriate. However, the human doctors' plans were rated as more practical and cheaper.
Why? Think of the AI as a brilliant theorist who knows all the rules of physics, but the human doctor is the engineer who knows how to build the bridge with the materials actually available in the local hardware store. The AI sometimes suggested perfect but expensive or complicated solutions, while the doctor knew what was easiest and most affordable for that specific patient.

3. How Did People Feel?

The Patients:

Before the chat: Some were skeptical. "Can a robot really understand my pain?"
After the chat: They felt much better. They said the AI was polite, listened well, and didn't make them feel rushed. It was like having a conversation with a friend who really cares.
Trust: Their trust in AI went up significantly. They felt more prepared for their real doctor visit.

The Real Doctors:

When the patients arrived for their real appointment, the doctors had already read the AI's chat summary.
The Feeling: Doctors said, "Wow, this is helpful." Instead of spending the first 10 minutes just gathering basic facts, they could jump straight into the important stuff. It was like arriving at a meeting where everyone had already read the agenda beforehand.
The Vibe: The visits became more like a partnership rather than an interrogation.

4. The Catch (The Limitations)

While the results are exciting, there are a few "buts":

The Tech Barrier: The study required patients to use a laptop or desktop computer. People who only had smartphones or weren't good with computers were left out. This is like testing a new car only on smooth highways and not on bumpy dirt roads.
No Physical Touch: The AI couldn't feel a lump, listen to a heartbeat, or look in an ear. It's a great detective, but it can't do a physical exam.
The "Hawthorne Effect": Because patients knew a human was watching the AI, they might have been extra careful with their answers. In the real world, without a human watching, the AI might act slightly differently.

The Big Picture

This study is like a successful test drive for a self-driving car.

It proved that the car (the AI) can drive safely on the road (the clinic) without crashing.
It proved that the passengers (patients) feel comfortable in the car.
It proved that the car can help the human driver (the doctor) get to the destination faster.

The Conclusion:
We aren't ready to replace doctors with robots yet. But this study shows that AI can be a super-powerful assistant. It can do the "homework" of listening and organizing information so that when you see your doctor, you can spend that precious time talking about what matters most, rather than just filling out forms.

It's a huge step toward a future where your medical care is faster, safer, and less stressful for everyone.

A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic

1. The Setup: A "Medical Rehearsal"

2. The Results: Did the AI Pass the Test?

3. How Did People Feel?

4. The Catch (The Limitations)

The Big Picture

1. Problem Statement

2. Methodology

3. Key Contributions

4. Key Results

A. Safety and Feasibility

B. Clinical Reasoning Performance

C. User Experience

5. Significance and Future Directions

A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic

1. The Setup: A "Medical Rehearsal"

2. The Results: Did the AI Pass the Test?

3. How Did People Feel?

4. The Catch (The Limitations)

The Big Picture

1. Problem Statement

2. Methodology

3. Key Contributions

4. Key Results

A. Safety and Feasibility

B. Clinical Reasoning Performance

C. User Experience

5. Significance and Future Directions

More like this

Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning

Missingness Bias Calibration in Feature Attribution Explanations

Why Is RLHF Alignment Shallow? A Gradient Analysis

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent Planning