Leveraging Taxonomy Similarity for Next Activity Prediction in Patient Treatment

This paper proposes the TS4NAP approach, which leverages medical taxonomies (ICD-10-CM and ICD-10-PCS) and graph matching to enhance the accuracy and explainability of next-activity prediction in patient treatment planning using MIMIC-IV data.

Martin Kuhn, Joscha Grüger, Tobias Geyer, Ralph Bergmann

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine you are a doctor standing at a crossroads, trying to decide the next step for a patient's treatment. The patient's medical history is a massive, tangled library of thousands of books (past cases), but you only have a few minutes to find the one story that looks most like your current patient's situation.

This paper introduces a new "smart librarian" system called TS4NAP (Taxonomy Similarity for Next Activity Prediction) to help doctors make that decision.

Here is the breakdown of how it works, using simple analogies:

1. The Problem: The "Black Box" vs. The "Library"

In modern medicine, computers can predict what a patient needs next (like a surgery or a test) by looking at past data. However, many of these computer systems are "Black Boxes." They give you an answer, but they can't explain why. It's like a GPS that says "Turn left" but refuses to tell you there is a roadblock ahead. Doctors don't trust tools they can't understand.

Also, medical data is messy. Two patients might have the same disease but slightly different codes in the computer system. A standard computer might think they are totally different people, missing the connection.

2. The Solution: The "Medical Dictionary" (Taxonomies)

The authors realized that doctors use a special, organized dictionary called ICD-10.

  • ICD-10-CM is the dictionary for Diagnoses (what is wrong with you).
  • ICD-10-PCS is the dictionary for Procedures (what the doctor did).

Think of these dictionaries not just as lists of words, but as family trees.

  • "Heart Attack" is a child of "Heart Disease."
  • "Heart Disease" is a child of "Circulatory System Problems."

The TS4NAP system uses these family trees. It knows that even if two patients have slightly different codes, if those codes are "cousins" in the family tree, they are still very similar.

3. How It Works: The "Double-Check" Match

The system tries to find the best match for a current patient by looking at two things at once:

A. The Diagnosis List (The "Who they are" check)
Imagine you are looking for a twin. You don't just look at their face; you look at their whole family. The system looks at the patient's list of diagnoses. It uses a math trick called Graph Matching (think of it as a high-speed matchmaking service) to pair up the current patient's diagnoses with past patients' diagnoses.

  • Analogy: If Patient A has "Broken Leg" and Patient B has "Broken Femur," a normal computer might say "No match." TS4NAP says, "Wait, those are the same thing in the family tree! They are a 90% match."

B. The Procedure History (The "What they did" check)
The system also looks at the order of past treatments. Did the patient get an X-ray before the surgery?

  • Analogy: It's like matching two recipes. Even if the ingredients are slightly different, if the steps are similar, the dishes will taste similar.

4. The Magic Trick: "Weighted Similarity"

The system doesn't just count matches; it weighs them.

  • If a diagnosis is very specific (like "Broken big toe"), it counts for a lot.
  • If a diagnosis is very general (like "Pain"), it counts for less.
  • It also checks the order. If the surgery happened before the X-ray in the past, but after in the current case, the system gives it a lower score, because the order matters.

5. The Result: A "Top 5" Recommendation

Instead of giving one single answer, the system looks at the top 5 most similar past patients and says:

"Based on these 5 people who were very similar to your patient, here are the 3 most likely next steps they took."

Crucially, it can explain its answer: "We suggest a CT scan because 4 out of the 5 most similar patients had one right after their diagnosis."

6. What the Study Found

The researchers tested this on real hospital data from Boston (MIMIC-IV).

  • The Good News: The system worked much better than the "Black Box" versions, especially for complex cases with many different treatments. It was like having a librarian who actually knows the books, rather than just a robot that counts words.
  • The Catch: For very simple cases (where there are only a few types of treatments), the fancy system didn't add much value. It's like using a super-computer to solve a Sudoku puzzle; sometimes a simple pencil is enough.

Summary

TS4NAP is a tool that helps doctors predict the next step in a patient's treatment by finding "medical twins" in the past. It uses the official medical family trees (ICD codes) to understand that different words can mean similar things.

  • Why it matters: It makes predictions accurate (by understanding medical nuances) and explainable (by showing the doctor which past patients it compared against).
  • The Metaphor: It turns a chaotic pile of medical records into a well-organized library where the librarian knows exactly which book to pull off the shelf to help you solve your problem today.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →