From Study Design to Executable Code: Automating Target Trial Emulation with Large Language Models

The paper introduces THESEUS, a framework that leverages large language models to automatically translate free-text study descriptions into standardized, executable R scripts for the OHDSI ecosystem, thereby demonstrating high accuracy in parameter extraction and code generation to lower technical barriers in observational research.

Kim, H., Kim, M., Kim, S., You, S. C.

Published 2026-03-19
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a chef trying to recreate a famous dish from a food critic's review. The review says, "The soup was simmered for three hours with fresh basil and a pinch of salt."

In the world of medical research, this review is a study design. The "soup" is the analysis of patient data to see if a drug works. The problem is that translating that simple sentence into a working recipe (code) is incredibly hard. If one chef uses a gas stove and another uses an electric one, or if one measures salt in grams and the other in teaspoons, the final soup tastes different. In medical research, this means two teams studying the same drug might get different results just because they wrote their computer code differently.

This paper introduces THESEUS, a new tool that acts like a super-smart, robotic sous-chef to solve this problem.

Here is how it works, broken down into simple steps:

1. The Problem: The "Translation Gap"

Medical researchers often write their study plans in plain English (like the food critic's review). But to run the study on a computer, they need to translate that English into a very specific, rigid computer language called Strategus (part of the OHDSI ecosystem).

  • The Old Way: A human researcher has to read the English plan, understand the complex math, and then manually type out hundreds of lines of code. This is slow, prone to typos, and hard to copy-paste between different hospitals.
  • The Goal: We want to type a sentence in English and instantly get a perfect, error-free computer program.

2. The Solution: The "Two-Step Robot" (THESEUS)

The researchers built a system called THESEUS that uses Large Language Models (LLMs)—the same AI technology behind chatbots—to do the translation. It works in two distinct phases:

Step A: The "Architect" (Standardization)

First, the AI reads the messy, free-text description of the study (e.g., "We watched patients for one year after they started the drug").

  • The Analogy: Imagine the AI is an architect reading a client's vague sketch. It doesn't just guess; it forces the sketch into a strict blueprint (a JSON file).
  • It asks: "Did you mean 365 days? Did you mean to start counting the day the drug was given?"
  • It fills out a standardized form where every field has a specific rule. This ensures that "one year" is always understood as "365 days" and "start date" is always in the same format.

Step B: The "Builder" (Code Generation)

Once the blueprint is perfect, the AI switches roles to become a construction worker.

  • The Analogy: Now that the architect has the perfect blueprint, the builder doesn't need to guess where the walls go. They just follow the blueprint to build the house.
  • The AI takes the standardized blueprint and automatically writes the R code (the computer program) needed to run the study.
  • The Safety Net: The AI has a "self-check" feature. If the code it wrote has a typo or an error, the AI reads the error message, fixes the code, and tries again until it runs perfectly.

3. The "Human-in-the-Loop" (The Taste Test)

The researchers didn't just let the robot run wild. They built a Graphical User Interface (GUI)—a visual dashboard that looks like a control panel.

  • The Analogy: Think of it like a video game character creator. The AI suggests the settings based on your text, but you can look at the screen, see the "Time at Risk" or "Drug Match" settings, and say, "Actually, change that to 2 years," or "No, keep it as is."
  • This ensures a human expert always has the final say before the code is generated.

4. The Results: Does the Robot Cook Good Soup?

The team tested this system on 15 real medical studies and 5 studies from outside their network.

  • Accuracy: The AI was incredibly good at turning English text into the correct "blueprint" (about 90-98% accurate for studies already using their system).
  • Code Success: When the AI wrote the code, it worked on the first try about 80-100% of the time. If it failed, the "self-check" feature fixed it, bringing the success rate to nearly 100%.
  • Generalization: It even worked well on studies that didn't originally use their system, proving it can translate ideas from different "kitchens."

Why This Matters

Before this, only experts who knew both medical statistics and advanced computer programming could easily run these studies.

  • The Impact: THESEUS lowers the barrier to entry. It allows more researchers (and potentially more diverse teams) to run high-quality, reproducible studies just by describing what they want to do in plain English.
  • The Future: It turns the chaotic, messy process of writing research code into a standardized, reliable assembly line.

In short: THESEUS is a translator that turns your "I want to study this drug" into a "Here is the perfect, error-free computer program to study that drug," ensuring that everyone in the world is cooking the same recipe, no matter which kitchen they are in.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →