MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

MedFeat is a novel, feedback-driven framework that leverages Large Language Models to perform model-aware and explainability-guided feature engineering, achieving robust and clinically meaningful improvements in diverse healthcare tabular prediction tasks by prioritizing informative signals that are difficult for downstream models to learn directly.

Zizheng Zhang, Yiming Li, Justin Xu, Jinyu Wang, Rui Wang, Lei Song, Jiang Bian, David W Eyre, Jingjing Fu

Published 2026-03-04
📖 4 min read☕ Coffee break read

Imagine you are a doctor trying to predict which patients might get sick or pass away soon. You have a massive spreadsheet (a "tabular dataset") filled with numbers: age, heart rate, blood pressure, lab results, and more.

In the world of machine learning, there's a long-standing debate: Should we use simple, classic math tools (like decision trees) or complex, deep neural networks (like the ones that power AI art) to make these predictions?

Surprisingly, for medical spreadsheets, the simple tools often win. But they need help. They need someone to look at the raw numbers and say, "Hey, if you combine age and blood pressure in this specific way, it tells us something new." This is called Feature Engineering.

Traditionally, this was done by human experts. It was slow, expensive, and hard to scale. Then came LLMs (Large Language Models) like the one you are talking to now. They know a lot of medical facts. But early attempts to use them were like throwing a dart at a board while blindfolded: they guessed random combinations without checking if the computer model actually needed them.

Enter MedFeat.

Here is how MedFeat works, explained through a simple analogy:

The Analogy: The Master Chef and the Picky Eater

Imagine you are a Master Chef (The LLM) trying to create a new dish to impress a very Picky Eater (The Machine Learning Model).

  1. The Problem: The Picky Eater has a specific palate.

    • If the Eater is a Logistic Regression model, they only like simple, straight-line flavors. They can't taste complex curves unless you explicitly mix ingredients for them.
    • If the Eater is XGBoost (a tree-based model), they are great at spotting complex patterns on their own, but they might miss subtle, long-term trends or global statistics.
    • Old methods would just throw random ingredients at the Eater and see what sticks.
    • MedFeat is different. It asks the Eater, "What are you struggling to taste right now?" and then tells the Chef exactly what to cook.
  2. The "Feedback Loop" (The Taste Test):
    MedFeat doesn't just guess. It runs a simulation:

    • Step 1: It trains the model on the current data.
    • Step 2: It uses a tool called SHAP (think of this as a "flavor analyzer") to see which ingredients the model is already using and which ones it is ignoring.
    • Step 3: It tells the LLM: "The model is good at spotting high heart rates, but it's missing the instability of the heart rate over time. Also, it's ignoring the patient's age. Go make a new ingredient that combines those two."
  3. The "Island" Strategy (Don't Eat the Whole Buffet):
    Medical spreadsheets have hundreds of columns. If you ask the LLM to look at all of them at once, it gets confused (like trying to read a whole library in one second).

    • MedFeat groups the most important ingredients into small "Islands" (tiny subsets of data).
    • It sends just one Island to the LLM at a time. This keeps the instructions short, focused, and cheap to run.
  4. The Memory Bank (Learning from Mistakes):
    If the LLM suggests a "feature" (a new calculation) that makes the model worse, MedFeat remembers: "Don't do that again." If it suggests something great, it remembers: "Do more of that." This creates a cycle of continuous improvement.

Why is this a big deal?

  • It's Privacy-Safe: The LLM never sees the actual patient names or private records. It only sees the "recipe" (metadata) and the "flavor scores" (importance rankings). No patient data leaves the hospital.
  • It's Robust: The paper tested this on data from different years and different hospitals. The features MedFeat discovered (like "how unstable a patient's vitals are") worked well even when the data changed. It's like discovering a universal law of physics rather than a rule that only works on Tuesdays.
  • It's Explainable: Because the LLM is guided by the model's actual needs, the new features it creates make sense to doctors. They aren't just random math; they are clinically meaningful insights.

The Result

In the paper, MedFeat was tested on predicting things like 24-hour mortality (will the patient die in the next day?) and heart failure.

  • Without MedFeat: The models were okay, but missed subtle signals.
  • With MedFeat: The models got significantly better at spotting high-risk patients, even without needing hours of expensive tuning.

In short: MedFeat is a smart assistant that talks to a computer model, asks it what it's missing, and then uses a medical expert AI to invent the perfect new data points to fill those gaps. It turns a messy spreadsheet into a crystal-clear crystal ball for patient care.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →