An AI Implementation Science Study to Improve Trustworthy Data in a Large Healthcare System

This study presents an AI implementation science case study at Shriners Childrens that modernizes its research data infrastructure to the OMOP CDM standard, introduces a Python-based tool extending data quality assessment with Trustworthy AI principles, and evaluates hybrid implementation strategies for clinical applications like Craniofacial Microsomia to accelerate trustworthy AI adoption in healthcare.

Benoit L. Marteau, Andrew Hornback, Shaun Q. Tan, Christian Lowson, Jason Woloff, May D. Wang

Published 2026-03-06
📖 5 min read🧠 Deep dive

Imagine a massive, bustling hospital network called Shriners Children's. It's like a giant library with 22 different branches, each holding millions of patient stories (medical records). For years, these stories were written in different languages, on different types of paper, and stored in different filing systems. Some were in old ledgers (ICD-9), some in newer digital files (ICD-10), and some were just scribbled notes.

The doctors and researchers wanted to use Artificial Intelligence (AI) to read these stories, find patterns, and help kids get better. But there was a problem: You can't teach a robot to read if the books are written in gibberish or are missing pages.

This paper is the story of how the team at Shriners cleaned up their library, built a better filing system, and tested if their new AI tools actually work in the real world.

Here is the breakdown of their journey, using simple analogies:

1. The Problem: A Messy Library

The researchers realized that before they could build a fancy AI robot, they needed to fix the foundation.

  • The Old System: Their data warehouse was like a library built in 2015 using an old map. The books were there, but the cataloging system was outdated.
  • The Goal: They wanted to move everything to a modern, universal standard called OMOP CDM. Think of this as translating every book in the library into a single, perfect language that every computer in the world understands.
  • The Hurdle: The tools they usually use to check the library's quality (called the "Data Quality Dashboard") were built with programming languages (R and Java) that didn't play nice with their new, secure cloud computer system (Microsoft Fabric). It was like trying to use a gas-powered car engine in an electric car.

2. The Solution: Building a New Tool

Instead of waiting for the old tools to be fixed, the team built their own.

  • The Translator: They rewrote the quality-checking tool in Python (a popular coding language), making it compatible with their new cloud system.
  • The "Trustworthy" Check: They didn't just check if the books were there; they checked if the stories made sense. They used a framework called METRIC, which is like a checklist for a detective:
    • Measurement: Did the doctor write this down correctly, or was it a typo?
    • Timeliness: Is this information up-to-date, or is it from 10 years ago?
    • Representativeness: Does this data cover all types of patients, or just a specific group?
    • Informativeness: Is there enough detail, or are there huge gaps?
    • Consistency: Do the numbers match up across different hospital branches?

3. The Experiment: Two Ways to Test

The team tested their new system in two ways:

A. The "Systematic" Approach (The Big Sweep)
They looked at the entire library (all 22 hospitals).

  • Result: After modernizing the system, the quality score went up by about 8%. It was like organizing the library so you could find books faster.
  • The Catch: They found that data was missing in specific patterns. For example, one hospital branch might have great records for surgeries but terrible records for mental health notes. This told them that the AI needs to be careful about where it gets its data.

B. The "Case Study" Approach (The Deep Dive)
They picked one specific condition: Craniofacial Microsomia (CFM). This is a complex condition affecting a child's face and jaw, requiring many different doctors (surgeons, psychologists, etc.).

  • The Goal: Could the AI predict if a child with this condition might develop mental health issues based on their surgery history?
  • The Twist: They tried two ways to feed data to the AI:
    1. Raw Data: Feeding the AI the original, messy codes (like "Code A" from 2010 and "Code B" from 2020).
    2. Harmonized Data: Translating everything into the new, clean language first.
  • The Surprise: The AI performed almost exactly the same with both methods!
    • Analogy: Imagine trying to solve a puzzle. You can use the original, jagged puzzle pieces, or you can smooth them out first. The team found that smoothing them out (harmonizing) didn't make the puzzle easier to solve, but it did make it much easier to share the puzzle with other people.
    • Lesson: Standardizing data doesn't hurt the AI; it just makes the data safer and easier to share.

4. The Big Takeaway

The paper concludes that building AI for healthcare isn't just about writing smart algorithms. It's about plumbing.

  • The "Know-Do" Gap: We know how to build AI, but we struggle to put it to work in hospitals because the data is messy.
  • The Hybrid Approach: You need a mix of Systematic (fixing the whole library) and Case-Specific (solving one specific puzzle) strategies.
  • The Future: They are now working on FHIR, which is like a universal "USB port" for medical data. It allows different apps and systems to plug in and talk to each other instantly, rather than just looking at static dashboards.

In a Nutshell

This study is a blueprint for how to clean up a giant, messy medical data warehouse so that AI can actually be trusted to help doctors. They proved that while cleaning the data is hard work, it's the only way to ensure that when an AI makes a suggestion, it's based on truth, not confusion. They built a new tool to check the data, found that standardizing data helps collaboration without hurting performance, and showed that the future of medical AI depends on making data "interoperable" (able to talk to each other) rather than just accurate.