Classifying and Differentiating Individuals with Respiratory Syncytial Virus, Influenza, and COVID-19 Cases in OpenSAFELY

Using the OpenSAFELY platform, researchers developed and validated specific and sensitive computable phenotypes to accurately classify respiratory syncytial virus, influenza, and COVID-19 cases in English electronic health records, demonstrating their alignment with surveillance data while highlighting varying misclassification risks based on phenotype specificity and patient age.

Original authors: Prestige, E., Warren-Gash, C., Quint, J. K., Evans, D., Costello, R. E., Mehrkar, A., Bacon, S., Goldacre, B., Barley-McMullen, S., Yameen, F., Shah, P., Natt, M., Alder, Y., Hulme, W., Parker, E. P.
Published 2026-04-13
📖 3 min read☕ Coffee break read

Original authors: Prestige, E., Warren-Gash, C., Quint, J. K., Evans, D., Costello, R. E., Mehrkar, A., Bacon, S., Goldacre, B., Barley-McMullen, S., Yameen, F., Shah, P., Natt, M., Alder, Y., Hulme, W., Parker, E. P. K., Eggo, R. M.

Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine the National Health Service (NHS) in England as a massive, digital library containing millions of patient stories written in a special code. These stories are called Electronic Health Records (EHRs). Usually, these records tell us what happened to a patient (like "fever" or "hospitalized"), but they often don't explicitly say which specific virus caused it, especially if the patient wasn't tested.

This paper is about a team of digital detectives who used a super-secure computer system called OpenSAFELY to write a new set of "search rules" (called phenotypes) to find three specific troublemakers in that library:

  1. RSV (a common virus for babies)
  2. Flu (the seasonal flu)
  3. COVID-19

Here is how they did it, explained with some everyday analogies:

1. The Detective's Dilemma: The "Specific" vs. "Sensitive" Net

The researchers had to decide how to cast their fishing nets to catch these viruses. They tested two different strategies:

  • The "Specific" Net (The Sniper): This net has very small holes. It only catches fish that look exactly like the target.
    • Result: It rarely catches the wrong fish (low misclassification), but it might let some actual target fish swim away if they don't look perfect.
  • The "Sensitive" Net (The Wide-Mesh Net): This net has huge holes. It catches everything that might possibly be the target.
    • Result: It catches almost all the target fish, but it also scoops up a lot of other fish that look similar but aren't the target (high misclassification).

The Finding: When looking at mild cases (people who just stayed home with a sniffle), the "Wide-Mesh Net" was too messy. It confused mild cases of one virus with another. The "Sniper" approach was much better at telling them apart.

2. The Age Factor: The "Tiny vs. Grown-Up" Puzzle

The researchers noticed something interesting about who gets misidentified:

  • Older Adults: Their symptoms are often distinct enough that the computer rules work well, no matter which net they use.
  • Infants: This is the tricky part. Babies with severe illness are like chameleons; their symptoms can look like any of the three viruses. Even with the best rules, it's harder for the computer to tell if a sick baby has RSV, Flu, or COVID just by looking at their hospital notes. The risk of mixing them up is higher for infants than for adults.

3. Checking the Map: The "Weather Report" Test

How did they know their new rules worked? They compared their digital findings to the official weather reports (public surveillance data) that track how many people are sick each week.

  • The Result: Their digital maps matched the real-world weather reports perfectly. The seasons for when these viruses peak looked the same in their data as they did in the real world.

The Big Takeaway

Think of this paper as creating a universal translator for hospital records.

In the past, if a patient wasn't tested for a specific virus, doctors and researchers often had to guess or ignore the data. Now, thanks to this study, we have a reliable set of instructions that can look at a patient's coded history and say, "Ah, this pattern looks 90% like RSV," even without a lab test result.

This is a game-changer because it allows scientists to study these viruses in real-time, understand how they affect different age groups, and plan better for the future—all without needing to dig up old test tubes, just by reading the digital stories already written in the NHS library.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →