📄 cardiovascular medicine

Vision Language Model for Coronary Angiogram Analysis and Report Generation: Development and Evaluation Study

This study demonstrates the feasibility of fine-tuning the InternVL2-4B Vision-Language Model to automate coronary angiogram interpretation and report generation, achieving moderate performance in stenosis detection, anatomy labeling, and clinical report synthesis to potentially assist cardiologists in improving diagnostic efficiency and resource-limited care.

Original authors: Jiang, Q., Ke, Y., Sinisterra, L. G., Elangovan, K., Li, Z., Yeo, K. K., Jonathan, Y., Ting, D. S. W.

Published 2026-04-21

📖 6 min read🧠 Deep dive

CC BY 4.0

Original authors: Jiang, Q., Ke, Y., Sinisterra, L. G., Elangovan, K., Li, Z., Yeo, K. K., Jonathan, Y., Ting, D. S. W.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Picture: Teaching a Robot to Read Heart Movies

Imagine a cardiologist (a heart doctor) sitting in a dark room, watching a complex, moving movie of a patient's heart arteries. They have to pause, rewind, and analyze the video to find blockages (stenosis), name the specific roads (arteries), and then write a long, detailed report about what they found. This takes a lot of time and brainpower.

This paper is about teaching an Artificial Intelligence (AI) to do this job. Specifically, the researchers tried to teach a "Vision-Language Model" (VLM)—a type of AI that can both see images and speak in human language—to watch these heart movies and write the reports automatically.

Think of the AI as a very smart, but inexperienced medical student. The researchers wanted to see if they could "tutor" this student using thousands of old heart videos and reports so that it could eventually do the job as well as a real doctor.

The Toolkit: What They Used

The Teacher (The AI Model): They used a pre-trained AI called InternVL2-4B. Think of this as a student who has read every book in the library and knows what a "heart" looks like in general, but has never actually studied a specific heart angiogram before.
The Textbooks (The Data): They gathered 20,000 "keyframes" (important snapshots) from 1,987 patients. These came from four different sources, like a mix of public textbooks and a private hospital's secret archives.
The Tutoring Method (Fine-Tuning): Instead of teaching the AI from scratch, they used a technique called LoRA. Imagine this as giving the student a set of "cheat sheets" or "highlighted notes" specific to heart arteries, rather than making them re-read the whole encyclopedia. This is faster and cheaper.

The Three Tests: How Did the Student Do?

The researchers gave the AI three specific tasks, ranging from easy to very hard.

1. Picking the Best Frames (The "Highlight Reel" Task)

The Challenge: A heart video is 30 seconds long, but 90% of it is just the dye being injected or fading away. Only a few seconds show the arteries clearly. The AI had to pick the "golden moments" to study.
The Result: Excellent! The AI was like a pro film editor. It successfully picked the right frames 93% of the time. It knew exactly when the picture was clear and when to ignore the blurry parts.

2. Finding Blockages and Naming Roads (The "Spot the Difference" Task)

The Challenge: The AI had to look at a single snapshot and say, "There is a blockage here," and "That is the Left Anterior Descending artery."
The Result: Pretty Good.
- Finding Blockages: It found about 6 out of 10 blockages correctly. It wasn't perfect, but it was comparable to other specialized AI tools that only do this one thing.
- Naming Roads: It was great at identifying the "highways" (the big main arteries) but struggled with the "side streets" (tiny branches). This makes sense; it's easier to spot a big truck than a bicycle in a crowd.

3. Writing the Full Report (The "Essay" Task)

The Challenge: This was the hardest part. The AI had to look at multiple snapshots from different angles and write a cohesive paragraph summarizing the patient's condition, just like a doctor does.
The Result: Struggling.
- The AI could write in a medical-sounding format, but the content was often wrong.
- Hallucinations: Sometimes it invented problems that didn't exist (like saying there was a "collateral vessel" when there wasn't one).
- Missed Diagnoses: It often missed serious blockages in the main arteries.
- Why? The researchers realized they gave the AI a "weak" lesson plan. They showed it 5 pictures and one report, but didn't tell the AI which picture matched which sentence in the report. It was like giving a student 5 pages of a mystery novel and one page of the solution, without telling them which page solved which clue. The AI got confused and started guessing.

The "Aha!" Moments & Limitations

The paper admits that while the AI is promising, it's not ready to replace doctors yet. Here are the main hurdles they found:

The "IoU" Trap: In AI, we usually measure success by how perfectly a box drawn around a blockage overlaps with the "correct" box. The researchers realized this is like grading a student on how perfectly they drew a circle around a dot. If the student drew a slightly bigger circle that still covered the dot, the AI grading system might mark it wrong, even though the student was clinically correct. They had to adjust their grading rules to be more forgiving and realistic.
The "Normal" Problem: The training data was mostly sick hearts. The AI didn't get enough practice looking at healthy hearts. This is like a student who only sees pictures of broken cars; when they see a working car, they might think it's broken too.
The "Grouping" Issue: For the report generation, the AI was overwhelmed because it had to connect too many dots at once. The researchers suggest that in the future, they should teach the AI to link one specific image to one specific sentence before asking it to write the whole essay.

The Bottom Line

What did they achieve?
They proved that a modern AI can be "tutored" to look at heart movies, pick the best frames, and spot big blockages and major arteries with decent accuracy. It's a solid proof-of-concept.

What's next?
The AI is currently a "junior resident" who is good at spotting things but bad at writing the final report. To make it a "senior consultant," the researchers need to:

Give it better "lesson plans" (linking specific images to specific sentences).
Show it more healthy hearts so it doesn't panic and see disease everywhere.
Use bigger, more powerful computers to handle the complex task of writing the full report.

Why does this matter?
If perfected, this tool could act as a super-assistant for doctors. It could speed up the paperwork, help doctors in remote areas who don't have specialists nearby, and even double-check that doctors aren't missing anything important. It's not about replacing the doctor; it's about giving them a second pair of eyes that never gets tired.

Task	Performance Metrics	Key Observations
Keyframe Selection	Accuracy: 0.86 Precision: 0.89 Recall: 0.93 F1: 0.91	The ViT selector was highly robust, effectively filtering out non-diagnostic frames to reduce noise in the training pipeline.
Stenosis Detection	Precision: 0.56 Recall: 0.64 F1: 0.60	Performance was comparable to specialized neural networks (YOLOv8x). The model successfully localized lesions, though IoU scores were moderate due to annotation inconsistencies across datasets.
Anatomy Labelling	Weighted Precision: 0.50 Recall: 0.43 F1: 0.46	Stronger performance on major vessels (Left Main, Proximal LAD/Circumflex) with scores >0.7. Poor performance on small/distal branches (e.g., obtuse marginal) due to data scarcity and complexity.
Report Generation	Accuracy: 0.42 Specificity: 0.52 Recall: 0.23	Significant limitations. The model struggled with multi-image reasoning, exhibiting high false positives/negatives and "hallucinations" (e.g., describing collaterals that didn't exist). It failed to detect clinically significant stenosis in the Left Main artery in test cases.

Vision Language Model for Coronary Angiogram Analysis and Report Generation: Development and Evaluation Study

The Big Picture: Teaching a Robot to Read Heart Movies

The Toolkit: What They Used

The Three Tests: How Did the Student Do?

1. Picking the Best Frames (The "Highlight Reel" Task)

2. Finding Blockages and Naming Roads (The "Spot the Difference" Task)

3. Writing the Full Report (The "Essay" Task)

The "Aha!" Moments & Limitations

The Bottom Line

1. Problem Statement

2. Methodology

A. Datasets

B. Pipeline Architecture

C. Evaluation Metrics

3. Key Contributions

4. Results

5. Significance and Future Directions

The Big Picture: Teaching a Robot to Read Heart Movies

The Toolkit: What They Used

The Three Tests: How Did the Student Do?

1. Picking the Best Frames (The "Highlight Reel" Task)

2. Finding Blockages and Naming Roads (The "Spot the Difference" Task)

3. Writing the Full Report (The "Essay" Task)

The "Aha!" Moments & Limitations

The Bottom Line

1. Problem Statement

2. Methodology

A. Datasets

B. Pipeline Architecture

C. Evaluation Metrics

3. Key Contributions

4. Results

5. Significance and Future Directions

More like this