📄 health informatics

Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

This paper introduces an Exploratory AI Recommender that leverages explainable AI to generate data-driven recommendations for feature selection, non-linear terms, and interactions, thereby significantly enhancing the predictive performance and interpretability of high-dimensional clinical models like the Cox Proportional Hazards model.

Original authors: Yan, J., Machlanski, D., Butler, K., Dimitrakopoulos, P., Harrison, E. M., Guthrie, B. M., Tsaftaris, S. A.

Published 2026-05-24

📖 4 min read☕ Coffee break read

CC BY 4.0

Original authors: Yan, J., Machlanski, D., Butler, K., Dimitrakopoulos, P., Harrison, E. M., Guthrie, B. M., Tsaftaris, S. A.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a chef trying to create the perfect soup to predict who might get hurt (specifically, who might fall and get injured). You have a massive pantry with hundreds of ingredients (data points like age, medications, past illnesses, and lifestyle habits).

Traditionally, chefs (researchers) would pick ingredients based on old recipe books (medical literature). They might say, "Let's add salt and pepper because we know those are important." But with hundreds of ingredients, it's impossible for a human to taste-test every single combination to see if, for example, "adding a pinch of cinnamon only works if you also add a dash of nutmeg."

This is where the problem lies:

Simple recipes (standard statistical models) are easy to understand and trust, but they often miss complex flavor combinations, making the soup less tasty (less accurate).
Complex recipes (advanced AI) can taste amazing because they find hidden combinations, but they are "black boxes." You can't see why they added the cinnamon, so you don't trust them enough to serve them to patients.

The Solution: The "Taste-Tester" Robot

The authors of this paper built a new tool called an Exploratory AI Recommender. Think of this tool as a super-smart, robotic taste-tester that doesn't cook the final soup itself. Instead, it tastes the complex, high-performance AI soup, figures out exactly what makes it taste good, and then writes a new, simple recipe for the human chef.

Here is how the robot works in three simple steps:

1. The Taste-Test (The "Black Box" Explorer)
The robot first cooks a complex, high-performance soup using a method called a "Random Survival Forest." This robot is great at finding hidden patterns, like realizing that "cinnamon only helps if the person is over 65," or that "nutmeg actually ruins the soup if you have a specific allergy."

2. The Translation (The "Explainable" Step)
Once the robot knows the secret, it uses a translator (called SHAP, a type of Explainable AI) to break down the complex flavors into simple instructions. It looks at the soup and says:

"Throw away the oregano; it's doing nothing." (Feature Exclusion)
"The cinnamon isn't a straight line; it needs to be added in a curve." (Non-linear terms)
"The nutmeg and the cinnamon work best when mixed together." (Feature Interactions)

3. The New Recipe (The "White Box" Model)
The human chef takes these simple instructions and updates their traditional, easy-to-understand recipe (a standard Cox Proportional Hazards model). Now, the chef has a soup that is:

As tasty as the robot's complex version (highly accurate).
As easy to read as the original simple recipe (transparent and trustworthy).

What Did They Find?

The team tested this on a huge group of over 245,000 patients to predict falls and injuries.

The Old Way: The standard recipe had a "taste score" (C-index) of 0.805.
The New Way: After the robot gave its recommendations (removing 23 useless ingredients, changing how 2 ingredients were used, and mixing 221 new ingredient pairs), the score went up to 0.815.

While that number looks small, in the world of predicting health for hundreds of thousands of people, it's a huge improvement. It means the new recipe correctly identifies at-risk patients more often than the old one.

They also tested this on two other "pantries" (datasets for breast cancer and HIV) and found the robot worked there too, improving the recipes in those areas as well.

The Big Picture

The paper claims that this method bridges the gap between accuracy and trust.

You don't have to use a "black box" AI that no one understands.
You don't have to settle for a "simple box" model that misses important details.

Instead, you use the AI as a research assistant to discover the hidden rules of the data, and then you write those rules into a clear, auditable model that doctors can actually use and trust. The paper emphasizes that the AI didn't replace the doctor's judgment; it just gave the doctor a better, data-driven list of ingredients to use.

In short: They used a smart robot to find the secret sauce in a complex AI model, wrote that secret sauce down on a simple notepad, and proved that the simple notepad recipe works just as well as the complex robot.

Technical Summary: Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

Problem Statement

Predictive modeling in healthcare is essential for clinical decision-making, yet designing optimal models for high-dimensional datasets (e.g., electronic health records) remains a significant challenge. Traditional statistical methods, such as Cox Proportional Hazards (CPH) models, are interpretable and mathematically rigorous but often rely on linear assumptions that fail to capture complex biological realities, such as non-linear relationships (e.g., U-shaped risk curves) or high-order feature interactions. Conversely, modern machine learning (ML) models excel at capturing these complex patterns but operate as "black boxes," lacking the transparency and interpretability required for clinical trust and adoption.

Current approaches often treat explainable AI (XAI) merely as a post-hoc tool to justify black-box predictions. There is a gap in utilizing XAI to actively design better transparent ("white-box") models. Specifically, it is unclear whether XAI can automate the three critical tasks of feature engineering—feature selection, non-linear term identification, and interaction modeling—to improve conventional clinical models without sacrificing interpretability.

Methodology

The authors propose an Exploratory AI Recommender, a framework designed to use flexible AI models as an exploratory engine to generate data-driven recommendations for refining standard statistical models. The methodology follows a three-stage process:

Baseline Establishment: A standard, knowledge-driven multivariable CPH model is fitted using a curated set of predictors (e.g., demographics, comorbidities, medications) without advanced feature engineering (no interactions or non-linear terms).
Exploratory AI & Recommendation Generation:
- Exploratory Model: A Random Survival Forest (RSF) is trained on the same data to capture complex, non-linear patterns and interactions. This model is used solely for exploration, not final prediction.
- Interpretation: The RSF is interpreted using SHAP (SHapley Additive exPlanations) to generate Feature Attributions (FAs).
- Stratified Analysis: To avoid obscuring subgroup-specific risks, the authors perform an "extreme-group" FA analysis, separating patients into low-risk and high-risk cohorts based on RSF predictions.
- Recommendation Logic: The framework processes FAs to generate three specific types of recommendations:
  - Feature Exclusion: Features where the mean absolute FA is negligible (below a data-driven threshold) are recommended for removal.
  - Non-linear Terms: Features showing a weak correlation ( $|r| < 0.1$ ) between feature values and their FAs are flagged for non-linear modeling (e.g., quadratic terms or splines).
  - Feature Interactions: An iterative stratification analysis is performed. If the distribution of FAs for a specific feature differs significantly between strata defined by another feature (e.g., age), an interaction term between them is recommended.
Evaluation: The recommendations are integrated into an "augmented" CPH model. Performance is evaluated against the baseline using the Concordance Index (C-index) for discrimination and calibration plots (intercept and slope).

Key Results

The framework was evaluated primarily on a high-dimensional dataset of 245,614 patients from the DataLoch repository (predicting time to first fall or related injury) and validated on two public datasets (GBSG2 for breast cancer and ACT for HIV).

Primary Study (Fall Risk):
- Recommendations: The system recommended excluding 23 features, adding non-linear terms for 2 features, and including 221 first-order interaction terms.
- Performance: The augmented CPH model achieved a C-index of 0.815 (95% CI 0.809–0.822), a statistically significant improvement over the baseline CPH model (C-index 0.805). Calibration also improved (slope moved from 1.063 to 0.950).
- Validation: All recommendations were supported by existing medical literature, confirming known risk factors (e.g., frailty, age) and identifying novel hypotheses (e.g., non-linear alcohol risk, dementia-antispasmodic interactions).
Secondary Datasets:
- GBSG2 (Breast Cancer): The augmented model improved the C-index from 0.665 to 0.687.
- ACT (HIV): The augmented model improved the C-index from 0.725 to 0.770.
Generalizability: The method demonstrated consistent effectiveness across different clinical domains and dataset sizes, successfully identifying clinically plausible interactions and non-linearities.

Significance and Claims

The paper claims that the Exploratory AI Recommender successfully bridges the gap between the predictive power of complex AI and the interpretability required for clinical practice. Its primary significance lies in the following points:

Data-Driven Study Design: The framework shifts the role of AI from a final predictor to a design tool, automating the discovery of feature relationships that are often missed by manual, literature-driven approaches.
Preservation of Transparency: By embedding AI-discovered insights into standard statistical models (CPH), the resulting models retain the auditability and mathematical transparency necessary for regulated clinical environments, avoiding the "fidelity issues" of post-hoc explanations for black-box models.
Subgroup Discovery: The extreme-group analysis allows for the identification of risk factors specific to low-risk or high-risk subpopulations, which traditional models often overlook. This offers opportunities for targeted early interventions.
Scalability and Efficiency: The approach is computationally efficient compared to training complex deep learning architectures for final prediction, as the heavy lifting is done during a one-time exploratory phase.
Hypothesis Generation: The system acts as a hypothesis generator, surfacing novel, clinically plausible interactions (e.g., specific drug-comorbidity pairs) that warrant further investigation, while supporting rather than replacing clinical judgment.

The authors emphasize that the framework is intended to complement, not replace, established biostatistical methods and clinical expertise, providing a systematic mechanism to navigate high-dimensional feature spaces while maintaining the "sense-checking" capability required for high-stakes medical decisions.