Neretva: Neural Variational Inference for Allele-level Genotyping of Highly Polymorphic Genes

Neretva is an open-source framework that utilizes auto-encoding variational Bayes to efficiently and accurately perform allele-level genotyping and phasing of highly polymorphic gene families like CYP and KIR, outperforming current state-of-the-art methods in scalability and accuracy.

Zhou, Q., Ahmadi, S. P., Numanagic, I.

Published 2026-03-16
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Picture: The "Genetic Jigsaw Puzzle" Problem

Imagine your DNA is a massive library of instruction manuals. Most of these manuals are standard and easy to read. However, some sections of the library are incredibly messy. They contain thousands of nearly identical copies of the same book, with tiny differences in punctuation (like a comma instead of a period) that completely change the meaning of the sentence.

In the world of medicine, two specific families of these "books" are critical:

  1. CYP (Cytochrome P450): These are the body's "drug metabolism" manuals. They tell your liver how to break down medicines. If you get the recipe wrong, a drug might not work, or it might poison you.
  2. KIR (Killer-cell Immunoglobulin-like Receptors): These are the "security guard" manuals for your immune system. They decide if your body attacks a virus, a tumor, or a new organ transplant.

The Problem:
When scientists try to read these specific books using modern sequencing machines (High-Throughput Sequencing), it's like trying to solve a jigsaw puzzle where:

  • Half the pieces look exactly the same.
  • Some pages are missing (deletions).
  • Some pages are duplicated (copy number variations).
  • Some pieces from Book A accidentally got glued into Book B.

Current computer programs try to solve this by brute-forcing every possible combination. It's like trying to solve a 10,000-piece puzzle by trying every single piece in every single spot. It works for small puzzles, but for these messy gene families, it's too slow, too rigid, and often gets stuck.

The Solution: Enter "Neretva"

The authors created a new tool called Neretva. Instead of brute-forcing the puzzle, Neretva acts like a super-smart, intuitive detective who uses probability and "gut feeling" (mathematically speaking) to solve the mess.

Here is how Neretva works, broken down into simple steps:

1. The "Shadow" Detective (Handling Ambiguity)

In the messy gene families (especially KIR), a piece of DNA might look like it belongs to Gene A, but it could actually be a "shadow" cast by Gene B.

  • Old Way: The computer gets confused and says, "I don't know, maybe it's A, maybe it's B."
  • Neretva's Way: It says, "Okay, this piece looks like it belongs to A, but it's casting a shadow on B. Let's assume it could be either, and we'll figure out the most likely story as we go." It doesn't throw away confusing data; it uses it to learn.

2. The "Recipe" Generator (Variational Inference)

Instead of checking every single possible recipe (genotype) one by one, Neretva builds a probabilistic model.

  • Imagine you are trying to guess a secret recipe based on the smell of the kitchen. You don't taste every possible combination of ingredients. Instead, you use a mental model to say, "It smells like garlic and onions, so there's a 90% chance it's a garlic-onion soup, and a 10% chance it's a garlic-onion stew."
  • Neretva does this with DNA. It creates a "soup" of possibilities and slowly refines the recipe until it finds the most accurate one. It uses Neural Networks (AI) to learn the patterns of how these genes usually look, making it incredibly fast.

3. Counting the Copies (Copy Number Estimation)

Sometimes, a person has two copies of a gene, and sometimes they have five.

  • Neretva looks at how much "coverage" (how many times a specific part of the gene was read by the machine) it sees.
  • If it sees double the usual amount of "garlic smell," it knows there are likely two copies of the garlic gene. It uses a mathematical "regression" (a fancy way of fitting a line to data) to count exactly how many copies exist, even if the data is noisy.

4. The "Focus" Filter (Core vs. Minor Variants)

Not all differences in the DNA matter. Some are just typos that don't change the meaning (silent variants).

  • Neretva has a special filter that says, "Ignore the tiny typos; focus on the words that actually change the meaning of the sentence."
  • It uses a mathematical penalty system to ensure the final answer makes biological sense, prioritizing the "important" changes over the noise.

Why is this a Big Deal?

The paper tested Neretva against the current best tools (like Aldy, Geny, and Cyrius) on real human data.

  • Speed: While other tools might take an hour to solve a complex KIR puzzle, Neretva does it in minutes. It's like switching from a hand-cranked calculator to a supercomputer.
  • Accuracy: On the messy KIR genes, Neretva was the most accurate tool, correctly identifying the genetic makeup in 91% of cases, beating the previous leaders.
  • Flexibility: Because it's built on AI (Neural Networks) rather than rigid rules, it can adapt to new types of data or new gene families without needing to be completely rewritten.

The Bottom Line

Neretva is a new, open-source software that helps doctors and researchers read the most confusing parts of our DNA. By using AI to guess the most likely genetic story rather than trying to force a perfect fit, it solves the "messy gene" problem faster and more accurately than ever before.

This matters because getting these genes right is the difference between a patient getting the right dose of medication, a successful organ transplant, or a correct diagnosis for an autoimmune disease. Neretva makes precision medicine more reliable for everyone.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →