Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent

This paper presents an LLM-based system that enhances the automated standardization of legacy biomedical metadata by integrating real-time queries to authoritative terminology services, demonstrating significantly improved prediction accuracy over static prompt-based approaches when evaluated on HuBMAP datasets.

Josef Hardi, Martin J. O'Connor, Marcos Martinez-Romero, Jean G. Rosario, Stephen A. Fisher, Mark A. Musen

Published 2026-04-13
📖 5 min read🧠 Deep dive

The Big Problem: A Library of Confusing Notes

Imagine a massive library where scientists have been storing their research data for years. However, instead of using a standard filing system, everyone wrote their notes on sticky notes in their own handwriting, using their own slang.

  • One scientist wrote "Lung Tissue."
  • Another wrote "Pulmonary Sample."
  • A third wrote "Lungs."

To a human, these all mean the same thing. But to a computer trying to organize this library, they look like three completely different things. This makes it impossible to find all the lung studies at once or combine them to learn something new. This is the problem of bad metadata (the "labels" on the data).

Scientists have tried to fix this by writing rulebooks (called Ontologies and Templates) that say, "You must use the word 'Lung' and nothing else." But these rulebooks are often just long text documents that computers can't easily "read" or enforce automatically.

The Old Solution: The "Guessing" Robot

Researchers tried using AI (Large Language Models) to fix these messy notes. They gave the AI the messy note and said, "Hey, here is a rulebook that says 'Use Lung.' Please fix the note."

The problem? The AI was like a student who had memorized a textbook years ago but didn't have the book in front of them during the test.

  • It tried to guess the right word based on what it remembered.
  • Sometimes it guessed right.
  • Often, it guessed wrong (hallucinated) or used a word that was close but not the exact official term required by the rulebook.
  • It couldn't check if a specific word actually existed in the official dictionary at that exact moment.

The New Solution: The "Super-Researcher" Agent (ARMS)

The authors of this paper built a smarter system called ARMS (Agentic Real-Time Metadata Standardization). Instead of just asking the AI to guess, they gave the AI a set of tools and a direct line to the official library.

Think of the AI not as a student taking a test, but as a Super-Researcher with a team of assistants:

  1. The Librarian Tool: Before the AI tries to fix a note, it calls a tool to pull up the exact current rulebook (the CEDAR template) to see what the rules actually say.
  2. The Dictionary Tool: If the rulebook says "You must use a term from the 'Anatomy' dictionary," the AI doesn't guess. It uses a tool to search the live dictionary (BioPortal) in real-time. It asks, "Does 'Lung' exist in the 'Respiratory System' section of the dictionary right now?"
  3. The Logic Tool: If the dictionary returns a list of options, the AI uses its brain to pick the one that fits the context best.

How They Tested It

They tested this on 839 old research records from a huge project called HuBMAP (which maps the human body).

  • They had a "Gold Standard" version of these records, where human experts had manually fixed every single label perfectly.
  • They let the Old AI (just guessing) try to fix the records.
  • They let the New Agent (using tools) try to fix the records.

The Results: A Massive Improvement

The results were like night and day:

  • The Old AI (Guessing): Got about 54% of the labels right. It struggled especially with specific scientific terms, often making up words or picking the wrong ones.
  • The New Agent (Tool-User): Got about 79% of the labels right.
  • The "Magic" Part: For the specific scientific terms that had to come from a dictionary (Ontology-constrained fields), the New Agent jumped from 46% accuracy to 78%. In some specific types of experiments, it got 100% of those terms perfect!

Why This Matters

This paper proves that giving AI access to live tools is better than just giving it a textbook.

  • Static vs. Dynamic: You can't rely on an AI's memory because scientific dictionaries change every day. The new system checks the dictionary while it works.
  • Scalability: Humans can't fix millions of messy records one by one. This system can do it automatically, making scientific data "FAIR" (Findable, Accessible, Interoperable, and Reusable).

The Bottom Line

Imagine trying to organize a messy garage.

  • The Old Way: You ask a friend who thinks they remember where things go. They might put a hammer in the toolbox, or maybe in the garden shed. It's a mess.
  • The New Way: You give your friend a smartphone with a map app and a live inventory list. They look up exactly where the hammer should go according to the current rules, and they put it there.

This paper shows that when we give AI agents the right tools to look up facts in real-time, they stop guessing and start getting the job done right. This helps scientists share and reuse their data much more easily.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →