CoNVict: An Agentic AI System for Copy Number Variation Prioritization in Rare Disease Diagnosis

The paper introduces CoNVict, a two-stage agentic AI system that leverages large language models to automate the prioritization of copy number variants in rare disease diagnosis by integrating patient phenotypes and performing pairwise comparisons, thereby outperforming existing tools in identifying causal variants while bridging the gap between automated annotation and clinical reasoning.

Gencturk, M. M., Kara, M., Ozden, F.

Published 2026-03-17
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine your body is a massive, intricate library containing billions of books (your genes). Sometimes, a section of this library gets accidentally copied twice (a duplication) or a whole shelf gets torn out (a deletion). In the world of genetics, these are called Copy Number Variants (CNVs).

For patients with rare, unexplained diseases, finding the specific "missing shelf" or "extra copy" that caused their illness is like finding a single needle in a haystack the size of a city.

Here is a simple breakdown of the paper's solution, CoNVict, using everyday analogies.

The Problem: The Overwhelmed Librarian

Currently, when a doctor gets a genetic test result, they are handed a list of thousands of these "shelf changes."

  • The Old Way: Existing computer tools act like a basic barcode scanner. They scan every book and say, "This one looks suspicious," or "This one looks safe." But they can't tell the difference between a book that is suspicious for this specific patient and a book that is suspicious but irrelevant to the patient's symptoms.
  • The Gap: The computer can flag the "bad" books, but it can't read the patient's medical story (their symptoms) to figure out which bad book actually explains why the patient is sick. This leaves the human doctor to do the heavy lifting of cross-referencing thousands of possibilities, which is slow and prone to error.

The Solution: CoNVict (The AI Detective)

The authors created CoNVict, an AI system that acts like a brilliant, tireless Clinical Detective. Instead of just scanning barcodes, CoNVict reads the patient's story and investigates the suspects one by one.

It works in two main stages, like a two-round tournament:

Round 1: The "Triage" (CNVerdict)

Imagine the detective receives a stack of 500 suspect files (the genetic variants).

  • The Action: CoNVict reads the patient's symptoms (e.g., "seizures," "short stature") and compares them against the "criminal record" of every gene involved in the variants.
  • The Decision: It sorts the files into three piles:
    1. Relevant: "This gene's history matches the patient's symptoms. Keep investigating."
    2. Abstain: "This gene is weird, but I'm not sure. Let's keep it on the back burner."
    3. Irrelevant: "This gene has nothing to do with these symptoms. Throw it out."
  • The Result: The stack of 500 suspects is quickly whittled down to the top 32 most promising ones.

Round 2: The "Tournament" (Pairwise Battles)

Now the detective has a shortlist of 32 suspects. Who is the real culprit?

  • The Action: CoNVict pits the suspects against each other in head-to-head battles. It asks the AI: "Between Suspect A and Suspect B, which one explains the patient's symptoms better?"
  • The Logic: The AI doesn't just look at the gene; it looks at the context. It considers:
    • Does the gene break in a way that hurts the body?
    • Does the gene control parts of the body that are sick?
    • Is the gene known to be fragile?
  • The Winner: The "winner" of the battle moves to the next round. This continues until the AI has ranked the top 4 suspects from most likely to least likely.

Why is this a Big Deal?

The paper tested CoNVict on hundreds of simulated patients and found it was much better than current tools at finding the "needle in the haystack."

  1. It Handles the "Noise": In Whole Genome Sequencing (looking at the entire library), there are thousands of background changes that aren't causing the disease. CoNVict is great at ignoring the noise and focusing on the signal.
  2. It Solves the "Mystery Cases": Many genetic variants are labeled "Variants of Uncertain Significance" (VUS)—basically, "we don't know if this is bad." CoNVict uses its reasoning skills to figure out that even if a variant is new and unknown, if it breaks a gene that matches the patient's symptoms, it's likely the culprit.
  3. It Reads the Fine Print: It can even look at "non-coding" regions (the spaces between the books) which other tools often ignore, realizing that damaging the "margins" can sometimes be just as bad as damaging the "text."

The Bottom Line

CoNVict is like upgrading from a spell-checker to a senior editor. It doesn't just check if words are spelled right; it understands the story, the context, and the plot, helping doctors diagnose rare diseases faster and more accurately. It bridges the gap between raw data and human reasoning, acting as a super-powered assistant to the clinical geneticist.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →