Removing animal and nonhuman records in Ovid Embase: A comparison of 11 filters

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a chef trying to make a delicious soup that only contains human ingredients. You have a massive, chaotic pantry (the Ovid Embase database) filled with millions of jars. Some jars contain human recipes, some contain dog recipes, some contain cat recipes, and some contain a weird mix of both.

Your goal is to find the human recipes without accidentally throwing away the ones that mention humans but also have a dog in the title (like a study about how dogs help humans).

To do this, you have 11 different "magic sieves" (search filters). Each sieve is designed to shake out the animal jars and keep the human ones. But here's the problem: nobody knew which sieve actually worked the best. Some might be too coarse (letting dog jars through), and some might be too fine (throwing away human jars that had a tiny speck of dog in the description).

This paper is like a taste-test competition where the authors tried all 11 sieves to see which one saved the most human recipes while still getting rid of the animal ones.

The Setup: The Three Piles of Jars

To test the sieves, the authors didn't just look at random jars. They created three specific piles of 1,000 jars each:

The "Dog Only" Pile: Jars that are definitely about animals.
The "Human Only" Pile: Jars that are definitely about people.
The "Mixed Bag" Pile: Jars that are about both animals and humans (like a study on how a dog's presence affects a human's heart rate).

They then ran each of the 11 sieves through these piles to see what happened.

The Results: The "Goldilocks" Sieves

The authors measured two things:

Sensitivity (The "Don't Miss" Score): How many good human jars did the sieve keep? If you throw away a human jar by mistake, your sensitivity is low.
Specificity (The "Don't Let In" Score): How many bad animal jars did the sieve catch? If you let a dog jar slip through, your specificity is low.

Here is what they found:

1. The "Super Catcher" (Method 11)
This sieve was the most sensitive. It kept 90.6% of the human jars. It was very careful not to throw anything away. However, it was a bit sloppy; it let about 28% of the animal jars slip through.

Analogy: Imagine a security guard who is so afraid of missing a VIP guest that they let almost everyone into the building, even the people who shouldn't be there. You get all the VIPs, but you have to deal with a lot of extra people.

2. The "Strict Bouncer" (Method 3)
This sieve was the most specific. It caught 91.7% of the animal jars. It was very good at keeping the building clean of dogs. However, it was too strict; it accidentally threw away 25% of the human jars.

Analogy: Imagine a bouncer who checks IDs so strictly that they accidentally kick out a few VIPs because their ID looked slightly different. The club is very clean of intruders, but you lost some good guests.

3. The "Middle Ground" (Methods 1-9)
Most of the other sieves were in the middle. They kept about 78% of the human jars and caught about 90% of the animal jars. They were "okay" but not the best at either extreme.

The Big Surprise: What Got Thrown Away?

The authors looked closely at the human jars that the sieves accidentally threw away. They found a pattern:

The most common human jars to get tossed were the ones that actually used human participants or data.
Why? Because the database (the pantry) sometimes forgets to put a "Human" label on a jar, even if it has humans in it. The sieve looks for that label, doesn't find it, and throws the jar away.

It's like if a library catalog forgot to tag a book as "Fiction," and your robot librarian threw it in the trash because it didn't see the tag, even though it was a great story.

The "Why" Behind the Sieves

Why did Method 11 work so well at keeping humans, but Method 3 work so well at catching animals?

Method 11 was designed to be gentle. It only looked for specific "Animal Experiment" labels. If a jar didn't have that label, it stayed. This is great for keeping humans, but bad for catching animals that didn't get labeled correctly.
Method 3 looked at the Title and Abstract (the text on the jar) in addition to the labels. It searched for words like "animal study" or "excluded animals." This made it very good at catching animals, but it also accidentally caught human jars that mentioned animals in passing (e.g., "We excluded animal studies from our review").

The Takeaway for You

If you are a researcher (the chef) trying to find human studies:

Don't just pick a sieve blindly. You need to know your ingredients.
If you are terrified of missing a single human study (high sensitivity), use Method 11. You will have to do a bit more work later to manually check the "animal" jars that slipped through, but you won't lose any human gems.
If you are terrified of wasting time reading animal studies (high specificity), use Method 3. But be warned: you might accidentally throw away some human studies that mention animals.
Talk to your team. Before you start cooking, ask: "Do we care about studies where animals helped humans?" (Like therapy dogs). If yes, you can't use a sieve that throws those out.

The Bottom Line

There is no perfect sieve. Every tool has a trade-off.

Method 11 is the safest bet for not missing human studies.
Method 3 is the best at cleaning out the animal noise.

The authors conclude that information specialists (the librarians) and researchers (the chefs) need to sit down and have a chat about what they are willing to lose in exchange for what they want to find. It's all about balancing the risk of missing a good recipe against the risk of cooking with the wrong ingredients.

Removing animal and nonhuman records in Ovid Embase: A comparison of 11 filters

The Setup: The Three Piles of Jars

The Results: The "Goldilocks" Sieves

The Big Surprise: What Got Thrown Away?

The "Why" Behind the Sieves

The Takeaway for You

The Bottom Line

1. Problem Statement

2. Methodology

3. Key Contributions

4. Results

5. Significance and Conclusion

Removing animal and nonhuman records in Ovid Embase: A comparison of 11 filters

The Setup: The Three Piles of Jars

The Results: The "Goldilocks" Sieves

The Big Surprise: What Got Thrown Away?

The "Why" Behind the Sieves

The Takeaway for You

The Bottom Line

1. Problem Statement

2. Methodology

3. Key Contributions

4. Results

5. Significance and Conclusion

More like this

A case report on gendered biases in a Finnish healthcare AI assistant

An End-to-End Synthetic Oncology Clinical Trial Framework Integrating Radiographic Response, Circulating Tumor DNA, Safety, and Survival for Decision-Oriented Clinical Data Science

Who is leading medical AI? A systematic review and scientometric analysis of chest x-ray research

High-Throughput Observational Evidence Generation Using Linked Electronic Health Record and Claims Data

Perception of Safety in Behavioral Health Crisis Units among Patients and Care Partners versus Artificial Intelligence (AI): A Multimethod Study