Automated echocardiographic measurements for longitudinal monitoring of ATTR cardiomyopathy: agreement and repeatability analysis

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine your heart is a house, and ATTR cardiomyopathy is a slow, creeping mold that hardens the walls and makes the rooms smaller over time. To stop this mold from taking over, doctors need to catch the changes early. The best tool they have to peek inside the house is an echocardiogram (an ultrasound of the heart), but there's a catch: it's like trying to measure a room with a ruler held by a human hand. Depending on who is holding it, how tired they are, or how they angle the ruler, the measurements can change slightly. This "human error" makes it hard to tell if the house is actually getting smaller or if the measurer just had a bad day.

This study asked a big question: Can a robot (Artificial Intelligence) measure the heart as reliably as a human doctor, so we can track the disease over time without the "human shake"?

Here is the breakdown of what they did and what they found, using some everyday analogies:

The Experiment: A "Measuring Contest"

The researchers gathered 62 patients with this heart condition. They looked at 178 different heart scans taken over a year. To test the waters, they had four different "measurers" look at the same scans:

The Expert: A seasoned cardiologist (the gold standard).
The Second Expert: Another experienced doctor.
The Novice: A doctor-in-training (less experienced).
The Robot: An AI algorithm (Us2.ai) that automatically measures the heart without human hands.

The Results: Who Measured Best?

1. The Experts vs. The Robot (Agreement)
When the robot tried to measure the thickness of the heart walls and the size of the heart's main chamber, it didn't always agree perfectly with the top expert.

The Analogy: Imagine two people measuring a table. The expert says it's 60 inches; the robot says it's 58 inches. They are close, but there's a consistent difference (a "bias").
The Reality: The robot was "moderately" good at agreeing with the expert. It consistently measured the heart walls slightly thinner and the chamber slightly smaller than the human expert did.

2. The Experts vs. Each Other (Reliability)
When the two expert doctors measured the same heart, they agreed very well.

The Analogy: Two master carpenters measuring the same table will get almost the exact same number.
The Reality: Their measurements were very consistent with each other, showing that experienced humans are a solid benchmark.

3. The "Bad Day" Test (Repeatability)
This is the most important part for tracking disease. The study asked: If the same person measures the same heart twice, how much does the number jump around?

The Novice: Like a student learning to use a ruler, their numbers jumped around a lot (high variability). If they measured the heart on Monday and then again on Tuesday, the numbers might change significantly just because they were shaky.
The Experts: They were very steady. Their numbers stayed consistent.
The Robot: This was the big win. Even though the robot didn't agree perfectly with the expert on the exact number, it was extremely consistent with itself. If the robot measured the heart today and then again tomorrow, it gave the exact same result every time.

The Bottom Line: Why This Matters

Think of tracking a disease like watching a garden grow.

The Problem: If you use a shaky hand to measure a plant, you might think it grew 2 inches when it only grew 1 inch, or vice versa. You can't be sure if the fertilizer is working.
The Solution: The AI is like a laser ruler. Even if the laser ruler is calibrated to be 1 inch shorter than your tape measure (the systematic bias), it never wobbles.

The Conclusion:
The study found that while the AI isn't a perfect copy of a human doctor's specific numbers, it is just as steady and reliable as an experienced doctor when measuring changes over time.

Because the AI is so consistent, it can act as a "steady hand" for doctors. It can detect the subtle changes in the heart's size that signal the disease is getting worse or better, without the "noise" of human error. This means doctors can use AI to personalize treatment plans with much more confidence, knowing that if the numbers change, it's the heart changing, not the person holding the ruler.

1. Problem Statement

The management of Transthyretin Cardiomyopathy (ATTR-CM) requires precise detection of disease progression to personalize treatment strategies, especially given the emergence of new therapies. While echocardiography is the standard tool for monitoring subtle longitudinal changes, its utility is currently constrained by operator dependence. Variability between different human readers and even within the same reader over time can obscure small but clinically significant changes in cardiac structure, making it difficult to reliably track disease progression or treatment response. There is a critical need for objective, reproducible measurement tools to overcome these limitations.

2. Methodology

This was a retrospective study designed to evaluate the agreement and reproducibility of a fully automated AI algorithm against human readers under real-world conditions.

Cohort: 62 patients with confirmed ATTR-CM.
Data Volume: 178 serial annual echocardiograms were analyzed.
Comparators: Measurements were performed by four distinct entities:
1. A Reference Cardiologist (expert).
2. A Second Cardiologist (expert).
3. A Novice Reader.
4. A Fully Automated AI Algorithm (Us2.ai).
Key Metrics Evaluated:
- Interventricular Septum thickness at end-diastole (IVSd).
- Left Ventricular End-Diastolic Volume (LVEDV).
Statistical Analysis:
- Agreement: Assessed using Bland-Altman analysis and Intraclass Correlation Coefficients (ICCs).
- Repeatability/Reproducibility:
  - Human Readers: Intrarater variability was calculated from repeated blinded measurements. The Limits of Agreement (LoA) were defined as $Mean Difference \pm 1.96 \times SD$ to establish the "Smallest Detectable Change" (SDC).
  - AI: Repeatability was assessed using within-study pairwise differences.

3. Key Contributions

Real-World Validation: The study moves beyond controlled settings to evaluate AI performance on serial, longitudinal data from a specific, high-risk patient population (ATTR-CM).
Benchmarking Against Human Variability: It directly compares AI repeatability not just against a single "gold standard," but against the inherent variability of both expert and novice human readers.
Quantification of Detectable Change: By calculating LoA for human readers, the study establishes a baseline for the minimum change required to be statistically significant in human-led monitoring, providing a crucial context for interpreting AI results.

4. Results

The analysis yielded distinct findings regarding agreement (how close the values are to the reference) versus repeatability (consistency of measurements over time).

AI vs. Reference Cardiologist (Agreement):
- IVSd: Moderate agreement (ICC 0.65) with a systematic bias of -1.9 mm (AI measured thinner).
- LVEDV: Moderate agreement (ICC 0.51) with a systematic bias of -39 mL (AI measured smaller volumes).
- Interpretation: The AI consistently underestimates these metrics compared to the reference expert, indicating a need for calibration or acceptance of a systematic offset.
Human vs. Human (Agreement):
- Agreement between the two cardiologists was good (ICC 0.79 for IVSd; 0.84 for LVEDV) with minimal bias (-0.2 mm and +3 mL).
Repeatability (Intrarater Variability & LoA):
- Reference Cardiologist: LoA of 3.0 mm (IVSd) and 43 mL (LVEDV).
- Second Cardiologist: LoA of 2.7 mm (IVSd) and 31 mL (LVEDV).
- Novice Reader: Showed significantly higher variability (LoA 5.1 mm and 61 mL).
- AI Algorithm: Demonstrated comparable repeatability to the experienced cardiologists, with LoA of 3.6 mm (IVSd) and 37 mL (LVEDV).

5. Significance and Conclusion

The study concludes that while the AI algorithm exhibits moderate agreement with expert cardiologists (due to systematic biases in absolute values), its repeatability is comparable to that of experienced human readers.

Clinical Implication: The high reproducibility of the AI suggests it is robust enough for longitudinal monitoring. Even if the absolute numbers differ slightly from a human expert, the AI's consistency means it can reliably detect changes over time (e.g., thickening of the septum or dilation of the ventricle) that exceed the noise level of human measurement variability.
Future Utility: Automated AI analysis can reduce the "noise" introduced by operator dependence, particularly when compared to novice readers, thereby enabling more sensitive detection of disease progression in ATTR-CM patients and supporting the personalization of emerging therapies.

Automated echocardiographic measurements for longitudinal monitoring of ATTR cardiomyopathy: agreement and repeatability analysis

The Experiment: A "Measuring Contest"

The Results: Who Measured Best?

The Bottom Line: Why This Matters

1. Problem Statement

2. Methodology

3. Key Contributions

4. Results

5. Significance and Conclusion

More like this

Causal Machine Learning for Comparative Effectiveness of GLP-1 RA versus SGLT2i in Heart Failure Using Real-World EHR Data

Association Between Hospital Tiers and Cardiogenic Shock Mortality: Mitigating the Transfer Penalty Through a Regionalized Hub-and-Spoke Model

The contribution of health behaviours to occupational class inequalities in cardiovascular disease: a longitudinal study of Finnish municipal employees

Fontan Subtype, Conduit Size, and Cardiac Morphologic Factors and Their Relationship to Exercise Capacity in the Fontan Circulation: A Single Ventricle Outcomes Network (SV-ONE) Study

Association between sleep quality and left ventricular structure in the Southall and Brent REvisited (SABRE) tri-ethnic study