ViroGym: Realistic Large-Scale Benchmarks for Evaluating Viral Proteins

The paper introduces ViroGym, a comprehensive benchmark comprising extensive deep mutational scanning data and real-world tasks to evaluate protein language models for predicting viral variant effects and guiding rational antigen selection for vaccine development.

Yichen Zhou, Jonathan Golob, Amir Karimi, Stefan Bauer, Patrick Schwab

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you are trying to predict the weather for next year to decide what kind of seeds to plant. You have two main tools:

  1. The Lab Test: You take a tiny sample of soil, put it in a controlled environment, and see how it reacts to rain and sun.
  2. The Weather Model: You use a super-computer that has read every weather report from the last 1,000 years to guess what the future holds.

For decades, scientists have relied mostly on the Lab Test to figure out how viruses (like the flu or SARS-CoV-2) will change. They grow viruses in a dish, mutate them, and see which ones survive. This is called Deep Mutational Scanning (DMS). It's accurate, but it's slow, expensive, and only tests a tiny fraction of possibilities.

Then came Protein Language Models (pLMs). Think of these as "Google Translate for biology." They have read millions of protein sequences (the "words" of life) and learned the "grammar" of how viruses evolve. They can guess how a virus might change without ever seeing it in a lab.

The Problem:
Until now, we didn't have a good way to test if these "Google Translate" models actually work for viruses. Most of them were trained on human or animal proteins, and the creators often deliberately removed viral data from their training to avoid bias. So, we didn't know if they could really predict the next big pandemic strain.

The Solution: ViroGym
The authors of this paper built ViroGym. Think of ViroGym as a massive, high-stakes gymnasium for AI models. It's a giant playground where they put different AI models through three specific "workouts" to see which one is the strongest athlete for predicting viral evolution.

Here is how the three workouts work, using simple analogies:

Workout 1: The "Mutation Gym" (Mutational Effect Prediction)

  • The Task: The AI is shown a virus and asked, "If we change this one letter in the virus's code, will it get stronger or weaker?"
  • The Analogy: Imagine a mechanic trying to guess what happens if you swap a specific bolt in a car engine. Some bolts are critical (the car stops), some are useless (nothing happens), and some make the car faster.
  • The Result: The AI models were surprisingly good at this. They could look at a mutation and guess its "fitness" (how well the virus survives) almost as well as the slow, expensive lab tests.

Workout 2: The "Disguise Test" (Antigenic Diversity)

  • The Task: Viruses are like master thieves; they wear masks (antigens) to hide from our immune system (the police). This workout asks: "If the virus changes its mask, will our current vaccines (the police) still recognize it?"
  • The Analogy: Imagine a "Wanted" poster. The AI has to look at a new criminal's face and guess, "Will the police recognize this guy, or has he changed his appearance enough to escape?"
  • The Result: This was the hardest workout. The AI models were okay, but not perfect. They struggled to predict exactly how well a vaccine would work against a new strain. There is still a lot of room for improvement here.

Workout 3: The "Crystal Ball" (Pandemic Prediction)

  • The Task: This is the ultimate test. The AI looks at the virus today and tries to predict which mutations will become the "dominant" ones in the real world tomorrow.
  • The Analogy: Imagine a sports analyst trying to predict which player will become the MVP of the league next season. They have to ignore the noise and pick the players who will actually succeed in the real game, not just in practice.
  • The Big Surprise:
    • The Lab Tests (DMS) were great at finding mutations that worked in the dish, but they missed many of the mutations that actually took over in the real world. It's like a player who is great in practice but freezes during the big game.
    • The AI Models (specifically one called ProGen2) were the winners. They predicted the mutations that actually became dominant in the real world (like the famous N501Y mutation in SARS-CoV-2) much better than the lab tests did.

Why This Matters

The paper concludes that AI models are better "crystal balls" than we thought.

While lab tests are still essential for understanding how a virus works in a controlled setting, they might be too narrow to predict how a virus will evolve in the messy, chaotic real world. The AI models, having "read" the history of evolution, seem to understand the bigger picture of what makes a virus successful.

The Takeaway:
ViroGym proves that we can use these AI models to help design vaccines before the virus even becomes a major threat. Instead of waiting for the virus to mutate and then scrambling to make a new vaccine, we can use these "Protein Language Models" to guess the next move, giving us a head start in the race against evolving diseases.

In short: The AI isn't just a translator anymore; it's becoming a fortune teller for viruses.