AERO: An AI Agent for Adaptive Eligibility Refinement and Optimization of Clinical Trial Criteria in Real-World Trial Emulation

The paper introduces AERO, an AI agent framework that optimizes clinical trial eligibility criteria for real-world data emulation by leveraging large language models to systematically classify and refine criteria, thereby improving the generalizability and accuracy of treatment effect estimates as demonstrated in a WARCEF trial emulation.

Original authors: Li, X., James, J., Pellikka, P. A., Zong, N.

Published 2026-05-01
📖 5 min read🧠 Deep dive

Original authors: Li, X., James, J., Pellikka, P. A., Zong, N.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to recreate a famous, perfectly controlled cooking competition (a Randomized Controlled Trial, or RCT) using a giant, messy, real-world pantry full of ingredients from thousands of different households (your Electronic Health Records).

In the original competition, the judges had a very strict list of rules: "Only use eggs from chickens under 2 years old," "No salt if the cook has a specific allergy," and "The cook must be able to stand for 4 hours without a break." These rules ensured the competition was fair and the results were clear.

However, when you try to find these exact ingredients in the real-world pantry, you hit a wall. You can't tell the age of the chicken just by looking at the egg. You don't have a record of every cook's allergy history. And you certainly can't know if a cook could stand for 4 hours if they never actually had to. If you try to apply the original rules exactly as written, you might end up throwing away 90% of your pantry, leaving you with almost no cooks to study. Or worse, you might accidentally keep only the "perfect" cooks, making your results look different from the real world.

Enter AERO: The Smart Sous-Chef

The paper introduces AERO (AI Agent for Adaptive Eligibility Refinement and Optimization). Think of AERO as a highly intelligent, well-read sous-chef who helps you translate those strict competition rules into something workable for your messy real-world pantry, without losing the spirit of the original contest.

Here is how AERO works, using simple metaphors:

1. The "Four-Box" Sorting System

Instead of blindly trying to follow every rule, AERO looks at each rule and asks, "What is this rule really for?" It sorts every rule into one of four boxes:

  • Box 1: The "Must-Haves" (Strict Inclusion): These are the core rules that define who the contest is for. Example: "The cook must be making soup." AERO keeps these as hard filters. If you aren't making soup, you're out.
  • Box 2: The "Safety Warnings" (Strict Exclusion): These are rules about danger. Example: "No one with a severe nut allergy can enter." AERO keeps these too, because safety is non-negotiable and usually easy to spot in the records.
  • Box 3: The "Background Noise" (Confounders): These are rules that describe the cook but don't necessarily disqualify them. Example: "The cook must have used a specific brand of salt in the past." In the real world, this might just be a factor that makes the soup taste different, not a reason to kick the cook out. AERO says, "Don't throw them out! Just write this down and adjust for it later when we taste the soup." This keeps more people in the study.
  • Box 4: The "Impossible Tasks" (Drop/Operational): These are rules that make no sense in a real-world pantry. Example: "The cook must be able to follow a 4-hour protocol without a break." You can't check this in a database. AERO says, "We can't measure this, so let's drop this rule entirely so we don't accidentally exclude good cooks."

2. The "Knowledge Librarian"

AERO isn't just guessing. It acts like a librarian who pulls out three different books before making a decision:

  • A Medical Encyclopedia (UpToDate) to understand the disease.
  • A Smart AI Assistant (Claude) to interpret the context.
  • A Drug Safety Manual (ToolUniverse) to check for dangerous interactions.

By combining the original trial rules with this extra knowledge, AERO decides which rules to keep, which to tweak, and which to toss.

3. The Test Drive: The WARCEF Trial

To see if AERO works, the researchers used it to recreate the WARCEF trial.

  • The Original Trial: Compared Warfarin (a blood thinner) vs. Aspirin for heart failure patients. The result? No difference. The two drugs worked about the same.
  • The Problem: If you tried to find these patients in real-world hospital records using the original strict rules, you'd likely get a tiny, weird group of patients that didn't look like real people.
  • The AERO Solution: AERO re-sorted the rules. It kept the heart failure diagnosis (Must-Have) and the safety exclusions (Safety Warning). But it moved things like "recent pacemaker" or "specific medication history" into the "Background Noise" box, meaning they kept those patients but adjusted the math later.

The Result:
When they ran the study with AERO's optimized rules, they got a result of HR = 1.56 (which is a statistical way of saying "no significant difference"). This matched the original trial's conclusion (HR = 1.01, "no difference").

The "Ablation" Lesson (The "What If" Experiment)
The paper also ran a cool experiment to prove why AERO's sorting matters. They took one specific rule: "No patients on a specific blood thinner (LMWH)."

  • Scenario A (Strict Rule): They threw everyone on that blood thinner out of the study. Suddenly, the results changed! It looked like one drug was better than the other. Why? Because by throwing those people out, they accidentally removed the sickest patients, skewing the group.
  • Scenario B (AERO's Way): They kept those patients but treated the blood thinner as "Background Noise" to adjust for later. The result went back to "No difference," matching the original truth.

The Big Takeaway

The paper claims that how you decide who gets into a study changes the results.

If you try to copy-paste a strict lab trial into the messy real world, you might break the experiment. AERO acts as a translator. It uses AI and medical knowledge to say, "This rule is about safety, keep it. This rule is about logistics, drop it. This rule is just a characteristic, adjust for it."

By doing this, AERO allows researchers to use real-world hospital data to answer questions that usually require expensive, controlled trials, while ensuring the answer is still accurate and fair. It bridges the gap between the "perfect world" of a lab and the "messy world" of a real hospital.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →