Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing

The paper introduces RawMed, a novel framework that generates high-fidelity, multi-table time-series synthetic Electronic Health Records (EHRs) with minimal preprocessing by leveraging text-based representations, while also proposing a comprehensive evaluation framework that demonstrates its superiority over existing baselines in distributional similarity, temporal dynamics, and utility.

Eunbyeol Cho, Jiyoun Kim, Minjae Lee, Sungjin Park, Edward Choi

Published 2026-03-03
📖 5 min read🧠 Deep dive

Imagine you are a doctor trying to train a new AI assistant to predict heart attacks or suggest better treatments. You have a massive library of patient records (Electronic Health Records, or EHRs) that contain everything: blood test results, medication lists, vital signs, and notes from every day a patient was in the hospital.

The problem? You can't share these records. They contain sensitive private information. If you leak them, it's a privacy disaster. If you try to hide the names, smart hackers can often figure out who the patients are just by looking at the unique patterns of their data.

So, scientists need to create fake patient records that look and act exactly like the real ones, but belong to no one. This is called "synthetic data."

However, most previous attempts to create this fake data were like making a "cartoon" version of a patient. They would:

  1. Pick only a few features: Like only recording heart rate and temperature, ignoring thousands of other details.
  2. Simplify the data: Instead of saying a patient's blood sugar was "98.4 mg/dL," they might just say "Normal" or "High."
  3. Mess up the timeline: They often forgot that medical events happen in a specific order (you get a blood test before you get the results).

This paper introduces RawMed, a new tool that creates a hyper-realistic, high-definition "digital twin" of patient records.

Here is how RawMed works, explained with some analogies:

1. The "Raw" Approach: No More Cooking the Books

Imagine you are trying to teach a student to cook.

  • Old Methods: You give them a recipe that says "Add a pinch of salt" and "Cook until done." You've removed the specific measurements and times. The student learns a vague idea, but they can't recreate the dish perfectly.
  • RawMed: You give the student the exact raw ingredients, the precise temperature of the oven, and the exact second to flip the steak. It keeps the data in its original, messy, complex form (the "raw" state) without simplifying it. This means the fake data is useful for any future research question, not just the ones the scientists thought of today.

2. The "Text" Trick: Speaking the Patient's Language

Medical records are stored in complex databases with many different tables (like a spreadsheet for labs, another for drugs, another for vitals).

  • The Problem: Computers usually struggle to read these messy, multi-table databases all at once.
  • The RawMed Solution: It treats the entire patient record like a story. It converts every lab result, every pill, and every vital sign into a sentence of text.
    • Example: Instead of a database row, it writes: "Lab test: Glucose, Value: 95, Unit: mg/dL."
    • By turning data into text, RawMed can use powerful language models (like the AI behind this explanation) to understand the relationships between different medical events.

3. The "Compression" Magic: Fitting a Novel into a Postcard

If you turn a whole hospital stay into text, it becomes a massive novel. If you try to feed a novel into a computer model, it chokes (it's too long and expensive to process).

  • The Analogy: Imagine trying to memorize a 500-page book word-for-word. It's impossible. But if you could summarize the book into a few key symbols that still hold the essence of the story, you could memorize it easily.
  • The Tech: RawMed uses a technique called Residual Quantization. Think of this as a "lossy" compression, but a very smart one. It takes the long text story and compresses it into a short, secret code (a "latent space").
    • It's like turning a high-definition 4K movie into a tiny, encrypted file that still looks perfect when you play it back.
    • This allows the AI to learn the complex patterns of thousands of patients without getting overwhelmed by the data size.

4. The "Time Travel" Engine

Medical data is all about time. A fever today might mean something different than a fever tomorrow.

  • The Innovation: RawMed doesn't just look at the data; it looks at the sequence. It learns that "Event A" usually happens 2 hours before "Event B."
  • It uses a special "Time Transformer" (a type of AI) that predicts the next event in the story based on what happened before. It's like a detective who can predict the next clue in a mystery because they understand the pattern of the crime.

5. The "Quality Control" Check

When you generate fake data, sometimes the AI gets creative in weird ways. It might invent a drug name that doesn't exist or say a patient had a heart attack before they were born.

  • The Fix: RawMed has a strict "Editor" step at the end. It checks every generated record to make sure:
    • The drug names are real.
    • The numbers are within realistic ranges (no one has a heart rate of 500).
    • The timeline makes sense (you can't get a test result before the test is ordered).
    • If a record fails, it gets thrown out and regenerated.

Why Does This Matter?

  • Privacy: Hospitals can share these "fake" records with researchers anywhere in the world without worrying about patient privacy.
  • Better AI: Because the fake data is so realistic and includes all the details (not just the easy ones), AI models trained on it are much smarter and more accurate.
  • Future-Proof: Since RawMed keeps all the original data columns, researchers can ask new questions years from now without needing new data.

In short: RawMed is like a master forger who can create a perfect, undetectable replica of a patient's medical history. It keeps all the tiny details, respects the timeline, and protects the patient's identity, allowing doctors and scientists to train better AI to save lives.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →