MedAdhereAI: An Interpretable Machine Learning Pipeline for Predicting Medication Non-Adherence in Chronic Disease Patients Using Real-World Refill Data

MedAdhereAI is an interpretable machine learning pipeline that utilizes real-world refill and claims data to predict medication non-adherence in chronic disease patients, providing actionable insights through SHAP explainability to support targeted clinical interventions.

Original authors: Yadav, S., Rajbhandari, S.

Published 2026-04-28
📖 3 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Managing chronic diseases like diabetes and hypertension requires patients to take medication consistently. However, many people struggle to follow their prescribed treatments. This non-adherence leads to more hospitalizations and contributes to hundreds of billions of dollars in annual healthcare costs. In many parts of the world, healthcare systems lack the resources to monitor every patient closely, making it difficult to identify who might stop taking their medicine before a health crisis occurs.

The researchers in this paper developed a system called MedAdhereAI to address this problem. Instead of relying on complex medical data like blood tests or expensive imaging, the system uses information that is already routinely collected: pharmacy refill records and insurance claims. By looking at how often a patient visits a doctor and the gaps between their medication refills, the system attempts to predict whether a patient is at risk of not taking their medicine.

To build this system, the researchers used a dataset of anonymized records for patients with diabetes and hypertension. They focused on specific patterns, such as the number of days between refills and the total number of healthcare visits. They tested two different types of mathematical models to see which could best identify at-risk patients. One model, called logistic regression, focused on straightforward relationships between data points, while the other, a random forest, looked for more complex, overlapping patterns.

The results showed that the logistic regression model was more effective at this specific task. It achieved a score of 0.82 on a standard measurement used to evaluate how well a model distinguishes between two groups (the ROC AUC), and it also showed high reliability in its probability estimates. The researchers found that the most important factors in predicting whether someone would stop taking their medication were the total number of doctor visits, the patient's age, and the length of the gaps between medication refills.

A central goal of the research was to ensure the system was not a "black box"—a system that provides an answer without explaining how it reached that conclusion. Clinicians are often hesitant to trust automated tools if they cannot see the reasoning behind a prediction. To solve this, the researchers integrated a method that provides explanations for both the entire group and for individual patients. For a single patient, the system can show exactly which factors, such as a long gap since their last refill, pushed the prediction toward a high risk of non-adherence.

The authors suggest that MedAdhereAI could serve as a decision-support tool. By identifying high-risk patients using only basic, widely available data, healthcare providers in resource-limited settings might be able to direct their limited time and resources toward the people who need the most help staying on their treatment plans. The researchers note that while the results are promising, future work is needed to test the system on different populations and to see if adding more detailed information, such as social or clinical data, improves its accuracy.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →