🔬 oncology

Patient2Sentence: Large Language Model-based Semantic Compression for Oncology Trial Eligibility Screening

The paper introduces Patient2Sentence (P2S), a large language model framework that compresses complex oncology electronic health records into concise, standardized sentences, achieving non-inferior clinical trial eligibility screening accuracy compared to full-record analysis while significantly reducing computational costs and enhancing interpretability.

Original authors: Yoshinari, G. H., Goulart, W. C. S., Urbano, A. B. O., Rabello, M. M., Zorzetto, M. M., Macedo, S. O. d., Vitorino, L. M.

Published 2026-05-05

📖 4 min read☕ Coffee break read

CC BY 4.0

Original authors: Yoshinari, G. H., Goulart, W. C. S., Urbano, A. B. O., Rabello, M. M., Zorzetto, M. M., Macedo, S. O. d., Vitorino, L. M.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Problem: The "Wall of Text"

Imagine a doctor trying to find the perfect patient for a specific cancer clinical trial. To do this, they have to read through a patient's entire medical history. This history is like a giant, messy library filled with thousands of pages of handwritten notes, lab reports, and scattered data.

Trying to find the one specific sentence in that library that says, "This patient is eligible for Trial X," is slow, exhausting, and prone to human error. It's like trying to find a specific needle in a haystack while wearing blindfolded gloves.

The Solution: The "Executive Summary"

The researchers created a new tool called Patient2Sentence (P2S). Think of this tool as a super-smart, ultra-fast librarian who can read that entire messy library in a split second and write a single, perfect sentence that captures everything important.

Instead of giving the computer (or a doctor) 50 pages of notes, P2S gives them one clear sentence like this:

"This 55-year-old woman has a specific type of breast cancer, has already had surgery, has no heart issues, and is currently taking Drug Y."

This single sentence contains all the "eligibility logic" needed to decide if the patient fits the trial, but it's much shorter and easier to read.

The Experiment: The "Taste Test"

To see if this "summary sentence" works as well as reading the whole book, the researchers ran a simulation:

The Setup: They created 75 fake (synthetic) patient records based on three real, famous breast cancer trials (KATHERINE, MONARCH-E, and OLYMPIA). These weren't real people, but computer-generated stories designed to look exactly like real medical records.
The Test: They asked a human expert (a radiation oncologist) to decide if each fake patient was eligible for the trials. This was the "Gold Standard."
The Comparison: They then asked an AI to make the same decision in two ways:
- Way A: Reading the full, long medical record.
- Way B: Reading only the single "Patient Sentence."

The Results: Short and Sweet

The results were impressive:

Accuracy: The AI made the right decision 94.7% of the time when using just the single sentence. This was almost identical to its accuracy when reading the full, long records.
Agreement: The decisions made from the short sentences matched the human expert's decisions almost perfectly (94.7% match).
Speed & Cost: This is where the magic happens. By turning long records into short sentences, the system used 67% fewer computer "tokens" (the basic units of data the AI processes).
- Analogy: Imagine you are paying to send a message by the word. Instead of sending a 100-word letter, you send a 33-word postcard. You get the same message across, but it costs you one-third of the price and arrives three times faster.

Why This Matters (According to the Paper)

The paper claims this method proves that you don't need to feed a computer a massive, messy data dump to get a smart answer. You can compress complex medical stories into simple, standardized sentences without losing the important details needed to make a decision.

Privacy: Since they used fake data, no real patient secrets were at risk.
Explainability: Unlike some AI that gives a "black box" answer, a "Patient Sentence" is written in human language. A doctor can read it and immediately understand why the AI made a decision.
Efficiency: It makes the process of screening patients for trials much faster and cheaper, potentially helping more people get into the studies they need.

The Catch (Limitations)

The authors are honest about the limits of their study:

It's a Simulation: They used 75 fake patients. They haven't tested this on real-world hospital records yet.
Specific Trials: They only tested three specific breast cancer trials. We don't know yet if it works for every type of cancer or every type of trial.
Complexity: The system worked best for trials with clear rules. For trials with very complex, time-sensitive rules (like the KATHERINE trial), the single sentence sometimes missed a tiny detail, leading to a few errors.

In a Nutshell

Patient2Sentence is a new way to turn a patient's entire medical history into a one-sentence summary that a computer can read instantly. The study shows that this summary is just as good as reading the whole history for deciding if a patient fits a clinical trial, but it does it three times faster and cheaper. It's like turning a 500-page novel into a perfect book blurb that tells you exactly what you need to know.

Technical Summary: Patient2Sentence (P2S) for Oncology Trial Eligibility Screening

Problem Statement

Efficient recruitment for oncology clinical trials is currently hindered by the complexity of interpreting long, heterogeneous, and largely unstructured Electronic Health Records (EHRs). Existing AI frameworks often rely on rigid data structures, narrow vocabularies, or specific architectures (e.g., ClinicalBERT) that struggle to generalize across institutions or integrate the temporal and causal dimensions of clinical reasoning. While Large Language Models (LLMs) show promise in understanding clinical narratives, they face challenges in processing unstructured text alongside structured numerical data without losing critical eligibility logic. There is a need for a method to compress complex patient records into a standardized, machine-interpretable format that preserves the reasoning required for trial screening while reducing computational overhead.

Methodology

The study employed a simulation-based diagnostic accuracy design following the STARD-AI guidelines to evaluate the Patient2Sentence (P2S) framework. The methodology involved three primary components:

Data Generation:
- Source: 75 fully synthetic EHRs were generated using GPT-5 (OpenAI) based on the inclusion/exclusion criteria of three pivotal adjuvant breast cancer trials: KATHERINE (HER2-positive), MONARCH-E (high-risk HR+/HER2-), and OLYMPIA (germline BRCA1/2-mutated).
- Composition: Each trial dataset contained 25 cases (5 eligible, 20 ineligible) to stress-test exclusion logic. The records included demographics, tumor subtypes, staging, comorbidities, treatments, and temporal clinical information.
- Validation: A board-certified radiation oncologist served as the reference standard, providing binary eligibility judgments ("Included" or "Excluded") for each full synthetic EHR.
The P2S Framework:
- Semantic Compression: GPT-5 converted each long-form synthetic EHR into a single, standardized natural-language "patient sentence." This sentence condensed key features (biomarkers, stage, comorbidities, treatments, temporal relationships) into a compact representation.
- Eligibility Assessment: The same GPT-5 instance, using a fixed zero-shot prompt, classified trial eligibility based solely on the generated patient sentence.
- Comparison: The eligibility classification derived from the compressed sentence was compared against the classification derived from the full EHR and the human expert's judgment.
Statistical Analysis:
- Agreement was measured using percent agreement and Cohen's kappa ( $\kappa$ ).
- McNemar's test was used to determine if there was a statistically significant difference in diagnostic accuracy between full-record assessments and sentence-based assessments.
- Computational efficiency was quantified by the reduction in token consumption.

Key Results

The study demonstrated that semantic compression via P2S preserves eligibility-defining clinical logic with high fidelity:

Overall Accuracy: Sentence-based classifications achieved 94.7% concordance with expert judgments (71/75 cases), corresponding to a Cohen's $\kappa$ of 0.83 (indicating almost-perfect agreement).
Statistical Significance: McNemar's test showed no statistically significant difference ( $p = 1.00$ ) between eligibility decisions made using full records versus those made using only the compressed sentences, supporting the non-inferiority of the compression method.
Trial-Specific Performance:
- MONARCH-E: 100% concordance ( $\kappa = 1.00$ ).
- OLYMPIA: 96% concordance ( $\kappa = 0.86$ ).
- KATHERINE: 88% concordance ( $\kappa = 0.65$ ). The lower performance in KATHERINE was attributed to the complexity of contextual interpretation required for neoadjuvant timing and residual disease, suggesting that temporal markers may be weakened during compression.
Computational Efficiency: The framework reduced token consumption by an average of 67.1% across all trials (ranging from 64.2% to 69.0%). This represents a threefold gain in computational efficiency without loss of reasoning fidelity.

Significance and Claims

The authors position Patient2Sentence as a foundational step toward interoperable, explainable, and privacy-preserving clinical AI. The paper claims the following significance:

Bridging the Gap: P2S successfully links free-text narratives with structured health data, allowing general-purpose LLMs to process diverse clinical contexts without specialized fine-tuning.
Operational Efficiency: By reducing token consumption by ~67%, the framework offers a path to near-real-time prescreening, potentially expanding the pool of candidates screened daily and reducing manual chart review burdens.
Explainability and Privacy: Unlike "black box" embeddings, the "patient sentence" is human-readable, preserving auditability. Furthermore, the exclusive use of synthetic data mitigates privacy and re-identification risks.
Future Trajectory: The authors propose that this architecture lays the groundwork for a "Narrative Inference Twin" (NIT), a digital twin subclass that infers quantifiable parameters solely from unstructured text, circumventing the need for direct structured data integration.

Limitations

The authors explicitly note that the study is a proof-of-concept with a small, entirely synthetic dataset focused on three specific breast cancer trials. Consequently, generalizability to real-world EHRs and other clinical domains remains unproven. The study did not perform formal subgroup analyses across demographic strata due to the dataset size. Validation with real-world data and across additional clinical domains is identified as a necessary next step.