BICEP: an extension to indels and copy number variants for rare variant prioritisation in pedigree analysis

This paper presents an extension to the BICEP Bayesian inference model that enables the prioritization of rare indels and copy number variants in pedigree-based analyses, demonstrating performance comparable to its original single nucleotide variant model.

Original authors: Ormond, C., Ryan, N. M., Corvin, A., Heron, E. A.

Published 2026-03-11
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine your family tree is a giant, ancient library filled with books of DNA. Sometimes, a typo in one of these books causes a family to suffer from a specific health condition. For a long time, scientists had a very smart librarian named BICEP who could help find these typos.

However, until now, BICEP was only good at finding single-letter typos (like changing an 'A' to a 'G'). But DNA isn't just about single letters; sometimes whole words are missing, extra words are added, or entire paragraphs are duplicated. These are called Indels (insertions/deletions) and Copy Number Variants (CNVs).

This paper is about giving BICEP a massive upgrade so it can read all types of typos, not just single letters.

Here is the breakdown of how they did it, using some everyday analogies:

1. The Problem: The "Single-Letter" Librarian

Previously, if you asked BICEP to find a missing paragraph in a book, it would just shrug and say, "I only know how to spot single-letter mistakes." This meant scientists were ignoring a huge chunk of potential causes for genetic diseases because they didn't have the right tools to prioritize them.

2. The Solution: Teaching BICEP New Tricks

The researchers taught BICEP two new skills:

  • Spotting "Word" Typos (Indels): These are small chunks of DNA that are missing or added.
  • Spotting "Paragraph" Typos (CNVs): These are large sections of DNA that are either deleted or duplicated.

To teach BICEP, they didn't just guess. They used a massive training dataset (like a giant stack of "Answer Keys" from the medical database ClinVar) to show BICEP examples of what a "bad" (disease-causing) typo looks like versus a "good" (harmless) one.

3. The Detective Tools: How BICEP Decides

When BICEP looks at a new typo, it acts like a detective weighing evidence. To decide if a typo is dangerous, it checks specific clues:

  • For Indels (The Word Typos): BICEP checks how rare the typo is in the general population (if it's common, it's probably harmless) and how badly it breaks the "grammar" of the gene.
  • For CNVs (The Paragraph Typos): This is trickier. The researchers tested four different "detective kits" to see which worked best:
    1. The "Nothing" Kit: Just guessing based on size.
    2. The "CADD-SV" Kit: A computer score that predicts how damaging a structural change is.
    3. The "Loeuf" Kit: A score that measures how much a gene hates losing its function (some genes are very sensitive; others are tough).
    4. The "Super Kit" (The Winner): Combining the CADD-SV score and the Loeuf score.

The Result: The "Super Kit" was the best detective. It gave the most accurate answers for both missing paragraphs (deletions) and extra paragraphs (duplications).

4. The Performance Check: Did it Work?

The researchers put the new BICEP through a final exam. They hid the answers and asked BICEP to guess which typos were dangerous.

  • The Good News: BICEP performed just as well on these new, complex typos as it did on the old single-letter ones.
  • The "High Precision" Rule: The most important thing for BICEP is Precision. If BICEP says, "This typo is dangerous," it wants to be right 99% of the time. The new models achieved this high level of confidence.
  • The Catch: BICEP is still a little bit "cautious." Sometimes, it misses a truly dangerous typo and says, "I'm not sure." This is better than crying wolf, but the researchers hope to fix this as they get more data in the future.

5. Why This Matters

Think of genetic analysis like searching for a needle in a haystack.

  • Before: BICEP could only find the needles made of steel (Single Letter variants).
  • Now: BICEP can find needles made of steel, wood, and plastic (Indels and CNVs).

This means doctors and researchers can now look at a much wider range of genetic errors when trying to solve the mystery of why a family has a rare disease. The tool is now available for everyone to use, making the search for genetic answers faster and more complete.

In short: The BICEP tool got a software update that lets it understand the full complexity of our DNA, not just the simple parts, helping us find the root causes of genetic diseases more effectively.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →