From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring

Seunghwan Kim (AnsibleHealth Inc., San Francisco, USA), Tiffany H. Kung (AnsibleHealth Inc., San Francisco, USA, Stanford School of Medicine, Stanford, USA), Heena Verma (AnsibleHealth Inc., San Francisco, USA), Dilan Edirisinghe (AnsibleHealth Inc., San Francisco, USA), Kaveh Sedehi (AnsibleHealth Inc., San Francisco, USA), Johanna Alvarez (AnsibleHealth Inc., San Francisco, USA), Diane Shilling (AnsibleHealth Inc., San Francisco, USA), Audra Lisa Doyle (AnsibleHealth Inc., San Francisco, USA), Ajit Chary (AnsibleHealth Inc., San Francisco, USA), William Borden (AnsibleHealth Inc., San Francisco, USA, George Washington University, Washington, D.C., USA), Ming Jack Po (AnsibleHealth Inc., San Francisco, USA)

Published Wed, 11 Ma

📖 4 min read☕ Coffee break read

View on arXiv ↗PDF ↗

Imagine you are the captain of a massive ship, and your job is to keep 70 million passengers safe. Every day, these passengers send you thousands of tiny signals: "My heart rate is up," "I gained two pounds," or "My blood pressure is high."

In the past, trying to manage this was like trying to drink from a firehose. The ship's crew (doctors and nurses) was drowning in a flood of data. They had to look at every single signal, often without knowing the passenger's full history. This led to two problems:

The "Boy Who Cried Wolf" effect: So many false alarms that the crew stopped listening (Alert Fatigue).
Missing the real danger: Because they were so busy, they sometimes missed the one signal that actually meant a passenger was sinking.

This paper introduces Sentinel, a new kind of "AI Captain" designed to solve this problem.

The Old Way: The Rigid Rulebook

Before Sentinel, hospitals used simple computer rules, like a rigid traffic light system.

The Rule: "If blood pressure is over 140, sound the alarm."
The Problem: This is like a traffic light that turns red for every car, even if it's just a slow-moving tractor. It ignores context. If a patient has high blood pressure every day, a reading of 141 isn't an emergency. But the old system screamed "EMERGENCY!" anyway, drowning the crew in noise.

The New Way: The Detective AI (Sentinel)

Sentinel is different. Instead of just looking at the number on the screen, it acts like a super-detective.

When a patient sends a reading, Sentinel doesn't just check the number. It immediately opens 21 different "files" on that patient:

What medications are they taking?
Did they just get out of the hospital?
What is their normal baseline?
Is this a sudden spike or a slow trend?

The Analogy:
Imagine a patient sends a message: "My blood pressure is 180."

The Old System: "ALARM! ALARM! Call the doctor!" (Even if the patient has had 180 for years and feels fine).
Sentinel: "Wait. Let me check the files. Ah, this patient is 82, has heart failure, just got out of the hospital yesterday, and their blood pressure was 120 this morning. This is a massive, sudden jump. This isn't just a number; it's a crisis. I will call the doctor immediately."

How Did It Do?

The researchers tested Sentinel against two things: the old "Rulebook" computers and a team of six human experts (doctors and nurses).

The "Human vs. Human" Test: Even the human experts disagreed with each other about 40% of the time. Medicine is messy; sometimes two good doctors will argue about how serious a situation is.
The "AI vs. Human" Test:
- Consistency: The AI was incredibly consistent. If you asked it the same question five times, it gave the same answer 83% of the time. Humans varied much more.
- Catching the Danger: The AI was better at spotting real emergencies than any single human on the team. It caught 97.5% of the emergencies, while the best human doctor only caught 80%.
- The "False Alarm" Balance: The AI did flag more things as "urgent" than the humans did (it was cautious). But here's the kicker: When independent experts reviewed the AI's "false alarms," they realized the AI was actually right! The humans had missed the danger. The AI had spotted subtle clues (like a patient gaining weight rapidly while on diuretics) that the humans, looking at a snapshot, missed.

The Cost and Speed

Speed: The AI took about 1.5 minutes to do its detective work.
Cost: It cost 34 cents per check.
Human Cost: A human doctor taking the time to look up all those files, read the history, and make a decision would cost significantly more and take much longer.

The Big Picture: Why This Matters

The paper argues that the reason Remote Patient Monitoring (RPM) failed in the past wasn't that the technology was bad; it was that the information overload broke the system.

Sentinel fixes this by acting as a smart filter. It doesn't just pass data to humans; it understands the data first.

It turns a "flood" of data into a "stream" of useful information.
It allows the "TIM-HF2" model (which proved that 24/7 monitoring saves lives) to finally become affordable and scalable.

In short: Sentinel is like a tireless, hyper-attentive nurse who never sleeps, never gets tired, reads every single medical file instantly, and knows exactly when to wake up the doctor and when to just keep watching. It turns the "noise" of remote monitoring into a clear, life-saving signal.

From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring

The Old Way: The Rigid Rulebook

The New Way: The Detective AI (Sentinel)

How Did It Do?

The Cost and Speed

The Big Picture: Why This Matters

1. Problem Statement

2. Methodology

System Architecture

Study Design

3. Key Contributions

4. Key Results

Reliability & Consistency

Performance vs. Baselines

Performance vs. Human Clinicians (Leave-One-Out)

Operational Metrics

5. Significance

From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring

The Old Way: The Rigid Rulebook

The New Way: The Detective AI (Sentinel)

How Did It Do?

The Cost and Speed

The Big Picture: Why This Matters

1. Problem Statement

2. Methodology

System Architecture

Study Design

3. Key Contributions

4. Key Results

Reliability & Consistency

Performance vs. Baselines

Performance vs. Human Clinicians (Leave-One-Out)

Operational Metrics

5. Significance

More like this

EchoGuard: An Agentic Framework with Knowledge-Graph Memory for Detecting Manipulative Communication in Longitudinal Dialogue

LLM-Grounded Explainability for Port Congestion Prediction via Temporal Graph Attention Networks

On the Strengths and Weaknesses of Data for Open-set Embodied Assistance

VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment

SCoUT: Scalable Communication via Utility-Guided Temporal Grouping in Multi-Agent Reinforcement Learning