An interpretable prototype parts-based neural network for medical tabular data

Imagine you are a doctor trying to diagnose a patient. You look at their blood work, their age, and their symptoms. You don't just look at one number in isolation; you look for patterns. You might think, "Ah, this patient has high sugar and low energy, which reminds me of a specific type of diabetes I've seen before."

For a long time, computers were terrible at this kind of "pattern matching" in a way humans could understand. They were like black boxes: you put data in, and a result came out, but no one knew why the computer made that decision. If a computer told a doctor, "This patient has a 90% chance of heart failure," the doctor would ask, "Why? What specific numbers made you say that?" The computer couldn't answer clearly.

This paper introduces a new computer model called MEDIC (Model for Explainable Diagnosis using Interpretable Concepts). Think of MEDIC not as a black box, but as a digital apprentice doctor who learns by studying a library of past cases and explaining its reasoning using simple, human language.

Here is how it works, broken down into simple analogies:

1. The "Fuzzy" to "Hard" Translation (The Ruler Analogy)

Medical data often comes as exact numbers (e.g., "Glucose is 142.7 mg/dL"). But doctors don't think in decimals; they think in ranges (e.g., "Normal," "High," or "Dangerous").

The Problem: Computers hate turning exact numbers into categories because it's hard to learn from them mathematically.
MEDIC's Solution: Imagine a ruler that can stretch and shrink. During training, MEDIC uses a "fuzzy" ruler that gently nudges numbers into categories so it can learn. Once it has learned the rules, it snaps the ruler into a "hard" position. Now, it speaks in clear categories: "This patient's sugar is in the High range." This makes the output instantly readable for a human.

2. The "Spotlight" on Clues (The Detective Analogy)

When a detective solves a crime, they don't look at every single piece of evidence equally. They focus on the key clues: "The muddy shoe print and the missing watch."

The Problem: Medical records have hundreds of features (blood pressure, cholesterol, age, etc.). Most are noise.
MEDIC's Solution: MEDIC uses "patching masks" which act like a spotlight. It learns to ignore the irrelevant background noise and shine a light only on the specific combination of clues that matter. It might say, "I am ignoring the patient's height, but I am focusing heavily on their high cholesterol combined with low albumin."

3. The "Case File" Library (The Memory Analogy)

This is the most important part. Instead of calculating a complex formula, MEDIC learns by building a library of Prototypes.

Think of it like this: Imagine a veteran doctor who has seen thousands of patients. They don't have a formula; they have a mental library of "archetypal cases."
- Case A: "The patient with the 'High Fever + Rash' pattern."
- Case B: "The patient with the 'Low Blood Pressure + Fast Heartbeat' pattern."
How MEDIC works: When a new patient walks in, MEDIC doesn't guess. It looks at the patient's "spotlighted" clues and asks: "Which case in my library does this patient look most like?"
- If the new patient looks 90% like "Case A," MEDIC predicts "Case A's outcome" and says, "I think you have Condition A because your symptoms match this specific pattern I've seen before."

4. Why This Matters (The Trust Analogy)

In the past, if an AI gave a diagnosis, doctors were skeptical because they couldn't see the logic. It was like a GPS telling you to turn left without showing you the map.

MEDIC changes the game: It shows you the map. It says, "I am predicting 'High Risk' because your Bilirubin is in the 'High' range, your Platelets are 'Low', and this specific combination matches Prototype #4 in our database, which was a patient who unfortunately passed away."

The Results

The researchers tested MEDIC on real medical data (liver disease, kidney disease, and diabetes).

Accuracy: It performed just as well as the most powerful, complex AI models currently used.
Transparency: Unlike those complex models, MEDIC could explain its decisions in plain English, using ranges and specific feature combinations that real doctors recognize.

In a Nutshell

MEDIC is a new kind of AI that stops trying to be a "magic oracle" and starts acting like a collaborative partner. It learns the rules of medicine by finding patterns in past cases, translates complex numbers into simple ranges, and explains its decisions by saying, "I made this choice because your case looks just like this other real case I know."

It bridges the gap between the super-power of computers and the common sense of doctors, making AI something doctors can actually trust and use to save lives.

1. Problem Statement

While deep neural networks (NNs) dominate fields like computer vision and NLP, their adoption in healthcare for tabular data (e.g., electronic health records, lab results) remains limited. This is primarily due to the "black-box" nature of standard NNs, which conflicts with the critical need for interpretability and trust in clinical decision-making.

Existing Explainable AI (XAI) methods face two main limitations in this context:

Post-hoc methods (e.g., SHAP, LIME): Often produce abstract, feature-importance scores that are difficult for physicians to map to clinical reasoning or case-based logic.
Prototype-based methods: While promising (as they align with human case-based reasoning), existing prototype networks (like ProtoPNet) are designed for images, relying on spatial structures (patches) that do not exist in tabular data. Furthermore, standard tabular models often struggle to learn meaningful, human-readable discretized ranges (e.g., "high glucose") directly from raw continuous values during training.

2. Methodology: The MEDIC Model

The authors propose MEDIC (Model for Explainable Diagnosis using Interpretable Concepts), a neural network architecture specifically designed for medical tabular data. It adapts the "prototypical parts" paradigm from computer vision to structured data through four key stages:

A. Interpretable Discretization (Fuzzy to Hard Binning)

Medical reasoning often relies on ranges (e.g., "low," "normal," "high") rather than raw continuous values.

Fuzzy Binning (Training): To allow gradient-based optimization, the model uses a fuzzy binning layer. Continuous inputs are softly assigned to $K$ learnable bins using Gaussian kernels. This allows the network to learn optimal threshold boundaries ( $\mu_k$ ) and bandwidth ( $\sigma$ ) end-to-end.
Hard Binning (Inference): After training, the soft assignments are converted to deterministic hard binning (one-hot vectors). This produces symbolic, human-readable intervals (e.g., $x \in [3.70, 3.82)$ ) that align with clinical guidelines.

B. Sparse Patching (Part Extraction)

Unlike images where patches are spatial, MEDIC learns sparse feature subsets (parts) from the binarized input.

Patching Masks: A trainable matrix $M$ applies sparse masks to the input vector, selecting and combining specific features.
Sparsity Regularization: An $L_1$ penalty is applied to these masks to force the model to use only a minimal number of features per "part," mimicking how physicians focus on specific combinations of indicators (e.g., "High CRP + Fever").

C. Prototype Learning and Comparison

Embedding: Each extracted "part" is passed through a shallow Multi-Layer Perceptron (MLP) to create a dense embedding.
Prototypes: The model maintains a set of learnable prototype vectors in the latent space.
Distance Calculation: The model computes the $L_2$ distance between the patient's extracted parts and the learned prototypes.
Max-Pooling: For each prototype, the model selects the most similar part from the patient's description (minimum distance). This creates a similarity vector representing how well the patient matches specific clinical patterns.

D. Three-Stage Training Procedure

To ensure stability and interpretability, MEDIC uses a specific training pipeline:

Stage 1 (Initialization): Train end-to-end with fuzzy binning and random prototypes. The model learns optimal thresholds and part masks.
Stage 2 (Discretization): Switch to hard binning and binarize the patching masks. The network is fine-tuned on these discrete inputs to solidify symbolic representations.
Stage 3 (Prototype Grounding): Replace the synthetic learned prototypes with embeddings derived from actual patient records in the training set. This ensures every prototype corresponds to a real, inspectable clinical case, enhancing trust.

E. Objective Function

The total loss function combines:

Cross-Entropy Loss ( $\mathcal{L}_{CE}$ ): For classification accuracy.
Sparsity Penalty ( $\mathcal{L}_{sparsity}$ ): Encourages parts to rely on few features ( $L_1$ on masks).
Diversity Penalty ( $\mathcal{L}_{diversity}$ ): Penalizes redundancy among prototypes to ensure they cover distinct regions of the data space.

3. Key Contributions

First Prototype Network for Tabular Data: Adapts the successful "part-based" vision architecture to non-spatial medical tabular data.
Learnable Discretization: Introduces a differentiable mechanism to learn clinically meaningful thresholds (bins) directly from data, bridging the gap between raw data and symbolic medical language.
Intrinsic Interpretability: The model is inherently interpretable (ante-hoc), not relying on post-hoc explanations. Predictions are based on similarity to real-world cases (prototypes) composed of sparse, human-readable feature combinations.
Grounded Reasoning: By anchoring prototypes to actual patient records in the final stage, the model provides explanations that are directly verifiable by domain experts.

4. Experimental Results

The authors evaluated MEDIC on three benchmark medical datasets: Cirrhosis, Chronic Kidney Disease (CKD), and Diabetes.

Predictive Performance: MEDIC achieved competitive or superior performance compared to strong baselines (Random Forest, XGBoost, MLP, Decision Trees).
- It achieved the best g-mean (geometric mean of sensitivity and specificity) on the Cirrhosis and CKD datasets.
- On the Diabetes dataset, it performed within 1% of the top-performing XGBoost model.
Interpretability Analysis:
- Threshold Alignment: The learned binning thresholds closely matched established clinical reference ranges (e.g., the learned albumin threshold of 3.70 g/dL matched the clinical lower limit of 3.5 g/dL).
- Prototype Parts: The model discovered sparse, clinically plausible feature combinations (e.g., "Bilirubin in [0.79, 3.43) AND Hepatomegaly = 0").
- Case Study: A qualitative analysis of a specific patient showed that MEDIC's decision could be traced to specific, high-similarity prototypes, highlighting relevant clinical indicators (e.g., bilirubin levels, drug usage, duration of stay) in a way that aligns with medical heuristics.

5. Significance

This work addresses a critical barrier to AI adoption in healthcare: the lack of trust in black-box models.

Bridging the Gap: MEDIC successfully bridges the gap between high predictive performance (typically associated with complex NNs) and high interpretability (typically associated with simple rules or trees).
Clinical Alignment: By learning discretized ranges and sparse feature combinations, the model speaks the "language" of clinicians, facilitating case-based reasoning and validation.
Future Impact: The approach paves the way for deploying transparent AI in clinical decision support systems, where the ability to explain why a prediction was made is as important as the prediction itself. The authors suggest future work could involve integrating domain-specific priors and dynamic adaptation to evolving disease patterns.