EVEE: Interpretable variant effect prediction from genomic foundation model embeddings

This paper introduces EVEE, an interactive web resource that leverages embeddings from the Evo 2 genomic foundation model to achieve state-of-the-art, interpretable pathogenicity predictions for diverse genetic variants across a unified framework, transforming interpretability from a trade-off into a complementary product of learned biological structure.

Pearce, M. T., Dooms, T., Yamamoto, R., Meehl, J., Molnar, C., Bissell, M., Hazra, D., Fang, C., Nguyen, N., Anderson, M., Osborne, C., Duffy, P., Toomey, B., Klee, E., Myasoedova, E., Ryu, A., Ayanian, S., Korfiatis, P., Redlon, M., Jain, A., Balsam, D., Wang, N. K.

Published 2026-04-11
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine your DNA is a massive, ancient library containing the instruction manual for building and running a human being. Sometimes, a single letter in a book gets changed, a word is deleted, or a sentence is added. These changes are called genetic variants.

Most of the time, we don't know if these changes are harmless typos, helpful edits, or dangerous errors that cause disease. In the medical world, these unknowns are called "Variants of Uncertain Significance" (VUS), and they are a huge headache for doctors trying to diagnose patients.

This paper introduces a new tool called EVEE (Evo Variant Effect Explorer) that acts like a super-smart, bilingual librarian who can not only spot the errors but also explain why they are dangerous in plain English.

Here is how it works, broken down into simple concepts:

1. The "Super-Librarian" (Evo 2)

First, the researchers used a massive AI model called Evo 2. Think of Evo 2 as a librarian who has read every book in the library of life, from bacteria to humans, millions of times. Because it has seen so much, it has learned the "grammar" and "style" of DNA. It knows what a healthy sentence looks like and what a broken one feels like, even without being explicitly taught the rules.

2. The "Fingerprint Scanner" (The Covariance Probe)

Usually, when scientists try to find errors, they look at one letter at a time. But EVEE uses a clever trick called a Covariance Probe.

Imagine you are looking at a crowd of people. A normal scanner might just count how many people are wearing red hats. But the Covariance Probe looks at the relationships between people. It notices: "Hey, whenever someone wears a red hat, they are also standing next to someone with a blue scarf, and they are both holding a specific type of umbrella."

In DNA terms, the model doesn't just look at the changed letter; it looks at how that change ripples through the surrounding neighborhood of letters. It captures the "vibe" or the "pattern" of the change. This allowed them to build a detector that is incredibly accurate at spotting bad variants, whether they are single letter swaps (SNVs) or chunks of missing text (indels).

The Result: It got a 99.7% accuracy score on known bad variants, beating almost every other tool currently in existence.

3. The "Zero-Shot" Magic

One of the coolest things about this tool is that it learned to spot single-letter errors, but then it automatically got really good at spotting missing or extra chunks of text (indels) without ever being trained on those specific types of errors.

It's like teaching a child to recognize a "dog" by showing them pictures of Golden Retrievers. Then, you show them a picture of a Chihuahua they've never seen before, and they say, "That's a dog too!" The model learned the concept of a broken instruction well enough to apply it to new types of breaks.

4. The "Translator" (Making it Interpretable)

Here is the biggest problem with most AI in medicine: It gives you a score (like "85% chance this is bad"), but it doesn't tell you why. Doctors can't use a black box score to make life-or-death decisions; they need evidence.

EVEE solves this with a two-step translation process:

  1. The Detective Work: The system checks the variant against 251 different biological "checklists." Does this change break a protein's shape? Does it mess up the splice site (the glue that holds genes together)? Does it remove a critical switch? It creates a "disruption profile"—a list of exactly what broke.
  2. The Storyteller: They fed this list of broken parts into a powerful AI language model (like a very smart journalist). This AI took the technical data and wrote a human-readable story.

Example: Instead of just saying "Pathogenic," the tool might say:

"This variant is likely harmful because it completely destroys the 'splice acceptor' site at the end of a gene segment. Imagine a train track where the switch is broken; the train (the cell's machinery) can't know where to stop, causing it to derail and produce a broken protein. This matches known patterns of disease in this gene."

5. The "Public Library" (EVEE Website)

The researchers didn't keep this tool to themselves. They built a free, interactive website called EVEE.

  • You can search for any of the 4.2 million genetic variants in the ClinVar database.
  • You can see the "disruption profile" (the list of broken parts).
  • You can read the AI-generated explanation in plain English.

Why This Matters

For years, scientists had to choose between accuracy (a very smart but confusing AI) and interpretability (a simple explanation that might be wrong).

This paper proves that you don't have to choose. By using the deep "understanding" of a genomic foundation model, they created a system that is both a world-class detective and a clear, articulate teacher. It turns a confusing math score into a clear medical story, helping doctors finally understand what those "Variants of Uncertain Significance" really mean for their patients.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →