Leveraging Open-Source Large Language Models to Identify Undiagnosed Patients with Rare Genetic Aortopathies

This study demonstrates that an open-source Large Language Model pipeline, enhanced with retrieval-augmented generation on curated genetic corpora, can effectively analyze unstructured clinical notes to identify undiagnosed patients with rare genetic aortopathies who would benefit from genetic testing, achieving high accuracy and offering a promising decision-support tool for earlier disease recognition.

Singhal, P., Li, Z., Yang, Z., Nandi, T., Morse, C., Rodriguez, Z., Rodriguez, A., Kindratenko, V., Sirugo, G., Pyeritz, R., Drivas, T., Madduri, R., Verma, A.

Published 2026-04-07
📖 3 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine your body is a complex, high-tech house. Most of the time, the lights flicker or the pipes leak because of simple, common issues like a burnt-out bulb or a clogged drain. But sometimes, the problem is a hidden, rare flaw in the house's original blueprints—a "genetic aortopathy." This is a rare condition where the main artery (the aorta) is weak and prone to bursting. Because these blueprints are so unique and the symptoms can look like many different common problems, doctors often miss them. If they miss it, the house could collapse suddenly and fatally.

The Problem: The Needle in the Haystack
Currently, finding these rare blueprint flaws relies on a doctor (the primary care physician) having a "gut feeling" that a patient's symptoms are actually genetic. They have to spot the needle in the haystack and then send the patient to a specialist. But with thousands of patients and millions of symptoms, many needles get lost in the hay. We need a better way to scan the whole haystack automatically.

The Solution: A Super-Smart Librarian
The researchers built a digital tool using an Open-Source Large Language Model (LLM). Think of this LLM as a super-smart, tireless librarian who has read every medical book, research paper, and case study ever written about these rare heart conditions.

However, just having a smart librarian isn't enough; they need to know which books to check for a specific patient. That's where RAG (Retrieval Augmented Generation) comes in.

  • The Analogy: Imagine the librarian is taking a test. Instead of memorizing every fact, they are allowed to open a specific, curated "cheat sheet" of relevant medical knowledge right before answering. When the tool looks at a patient's notes, it instantly finds the most relevant medical facts from its "cheat sheet" and uses them to make a smart decision.

How It Works in Real Life
The team fed this digital librarian 22,510 pages of real patient notes (the "narrative details" doctors write down) from 500 people.

  • The Test: They gave the tool 250 patients who did have the rare condition and 250 who didn't.
  • The Job: The tool had to read the messy, free-flowing notes and decide: "Does this person need a genetic test?"

The Results: A Winning Scorecard
The tool performed incredibly well. Out of 499 patients, it correctly identified 425 of them.

  • It was right about 83% of the time.
  • It was very good at catching the people who actually needed help (high sensitivity) and very good at not panicking about people who were fine (high specificity).
  • Only one patient was so confusing that the tool wisely said, "I'm not sure, a human doctor needs to take a closer look."

Why This Matters
This paper shows that we can use free, open-source AI to act as a safety net for doctors. Instead of relying on a human to remember every rare symptom, this tool scans the notes, cross-references them with the latest medical knowledge, and flags the patients who might be at risk.

The Bottom Line
Think of this as a digital early-warning system. It doesn't replace the doctor; it acts like a helpful assistant who whispers, "Hey, I noticed something in this patient's notes that matches a rare genetic pattern. Maybe we should run a genetic test to be safe." By catching these rare conditions earlier, we can prevent tragic heart failures and save lives.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →