CLIN-SUMM: Incremental Longitudinal Summarization of Clinical Notes Enables Scalable Representation and Early Disease Prediction

CLIN-SUMM is a framework that incrementally constructs structured, date-partitioned longitudinal summaries of clinical notes to reduce data redundancy while providing a scalable representation layer that improves disease prediction and longitudinal reasoning.

Original authors: D'Souza, V., Pace, D. F., Azhir, A., Nargesi, A., Holbrook, E. B., He, W., Naumann, T., Friedman, S., Atlas, S. J., Anderson, C. D., Hung, J., Maddah, M.

Published 2026-04-28
📖 3 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

When a person receives medical care over many years, doctors write thousands of notes about their visits. These notes contain vital details: new symptoms, changes in medication, or results from lab tests. However, for a doctor trying to understand a patient's long-term health, these notes can be overwhelming. They are often repetitive, with much of the same information copied from one visit to the next, and the most important updates can be buried under mountains of text.

The researchers in this paper developed a system called CLIN-SUMM to solve this problem. Instead of trying to summarize a patient's entire history into one static paragraph, the system builds a growing, organized timeline. Every time a new medical note is written, the system looks at it and identifies only the new information. It then adds this new data into specific categories, such as "Diagnosis" or "Medications," and stamps it with the date. This creates a structured record that evolves alongside the patient, much like a ledger that only records new transactions rather than rewriting the entire history of an account every day.

To build this, the researchers used a large language model. They designed a process where the system first creates an initial summary of a patient's first visit. For every visit after that, the system compares the new note to the existing summary and only extracts what has changed. To keep the process efficient, the system also filters out notes that are nearly identical to previous ones.

The authors tested this system using data from over 12,000 patients at Massachusetts General Hospital. They found that CLIN-SUMM reduced the amount of text by about 70%, making the information much easier for a computer to process while still keeping the details accurate. When doctors reviewed the summaries, they rated them highly for correctness and completeness, noting that the system rarely invented information that wasn't in the original notes.

The researchers then used these compressed summaries to see if they could help predict health issues. They focused on dementia as a case study. They trained a machine learning model using only the CLIN-SUMM summaries—not the original, massive pile of notes. The model was able to identify dementia cases with high accuracy. More importantly, the model could predict the risk of a dementia diagnosis up to three years before it actually happened. By looking at the summaries, the model picked up on subtle patterns in the text, such as mentions of memory issues, changes in walking, or dizziness, which are often documented in notes long before a formal diagnosis is made.

The paper also shows that these summaries are better at capturing medication details than the standard structured databases used in hospitals. The researchers found that the summaries identified many more instances of specific medications, like Donepezil, because the information was often written in the narrative notes but missed by the automated coding systems.

Ultimately, the researchers suggest that CLIN-SUMM acts as a middle layer. It transforms messy, repetitive, and massive amounts of text into a clean, organized, and time-stamped format that can be used for both quick human review and advanced computer modeling.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →