Mother-infant linked UK electronic birth cohorts representing 17.5 million births harmonised to the OMOP common data model

This paper describes the successful harmonization of five diverse UK electronic birth cohorts, encompassing over 17.5 million births, into the OMOP Common Data Model to create a standardized, federated resource that enables large-scale, reproducible maternal and child health research across England, Scotland, and Wales without sharing individual-level data.

Seaborne, M., Durbaba, S., Mendez-Villalon, A., Giles, T., Gonzalez-Izquierdo, A., Hough, A., Sanchez-Soriano, C., Snell, H., Cockburn, N., Nirantharakumar, K., Poston, L., Reynolds, R., Santorelli, G., Brophy, S.

Published 2026-03-25
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to solve a massive puzzle about the health of mothers and babies across the entire United Kingdom. You have five different teams, each holding a huge pile of puzzle pieces. The problem? Each team's pieces are different shapes, different colors, and have different pictures on the back. One team uses square pieces, another uses triangles, and they all speak slightly different "languages" to describe the same thing (like a baby's birth weight or a mother's blood pressure).

If you tried to mix them all together, it would be a mess. You couldn't see the big picture.

This paper is about a project called MIREDA (Mother and Infant Research Electronic Data Analysis) that decided to fix this. They built a giant, universal "adapter" that turns all those different puzzle pieces into the exact same shape and color, so they can finally fit together perfectly.

Here is how they did it, explained simply:

1. The Goal: One Big Picture

The researchers wanted to study over 17.5 million births across England, Scotland, and Wales. That's a lot of babies! But because the data came from different hospitals and regions, it was scattered and messy. They wanted to combine these records to answer big questions: Why do some babies arrive early? How does a mother's health affect her child years later? Does the place you live change your birth outcomes?

2. The Solution: The "Universal Translator" (OMOP)

To make the data talk to each other, they used a standard blueprint called the OMOP Common Data Model. Think of this like a universal power adapter.

  • Before: A British plug (UK hospital data) couldn't fit into a French socket (a Scottish hospital database).
  • After: The MIREDA team built an adapter that turns every British plug into a standard shape that fits everywhere.

Now, a "smoking during pregnancy" record from Wales looks exactly the same as a "smoking during pregnancy" record from London. They speak the same language.

3. The Tricky Part: The Mother-Baby Connection

Most computer databases are designed to track one person at a time. But pregnancy is special because it involves two people (mother and baby) who are deeply connected for a short time, and then the baby becomes their own person.

Imagine a database that only knows about "Alice" and "Bob," but doesn't know they are mother and son.

  • The Problem: Standard databases don't have a built-in "Family Tree" button.
  • The Fix: The team created a special "linking card" (called a fact_relationship table). It's like a name tag that says, "Alice is the mother of Bob, and they were together during this specific pregnancy." This allows researchers to follow the mother's story and the baby's story separately, but also see how they are connected.

4. The Process: The "Factory"

The team didn't move the actual data (which would be unsafe and illegal). Instead, they sent the "instructions" (the adapter rules) to each hospital's secure computer room (called a Trusted Research Environment).

  • The Factory: Inside each secure room, a machine (software) took the messy local data, ran it through the "Universal Translator," and turned it into the clean, standard OMOP format.
  • The Result: The raw data never left the hospital. Only the standardized, anonymized results were shared. This is like sending a recipe to a chef in a locked kitchen; the chef cooks the meal and sends you the taste, but you never see the ingredients or the kitchen.

5. The Challenges (The "Glitches")

Even with a great plan, there were bumps in the road:

  • Missing Addresses: The standard model only remembers your current address. But for babies, where you lived during pregnancy matters. They had to build a special "time-travel notebook" (a helper table) to keep track of where families lived in the past.
  • Lost Details: Sometimes, the standard model is too simple. For example, if a mother said "I don't know" about her ethnicity, the standard model might just say "Missing." The team had to find clever ways to keep those "I don't know" answers so they didn't lose important nuance.
  • Drug Names: One hospital might call a medicine "Paracetamol," another "Acetaminophen," and a third uses a code like "12345." They had to create a massive dictionary to translate all these names into one standard name.

6. Why This Matters

Now, researchers can run a single computer program that asks a question (e.g., "How many babies were born early in 2020?") and it instantly runs that question across all 17.5 million records in England, Scotland, and Wales.

  • No more waiting: They don't have to wait years to get permission to share data.
  • Better answers: They can spot rare diseases or rare side effects because they have such a huge crowd of people to study.
  • Fairness: They can compare if a baby born in a poor area has different outcomes than one in a rich area, across the whole country.

The Bottom Line

This paper is a blueprint for how to turn a chaotic pile of different medical records into one giant, organized library. It's like taking five different languages and creating a single, perfect dictionary so that doctors and scientists from all over the UK can finally work together to make mothers and babies healthier.

In short: They built a universal translator for baby data, allowing the UK to finally "speak" as one voice to improve healthcare for everyone.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →