MosaicTR: tandem repeat somatic instability quantification from long-read sequencing

MosaicTR is a computational tool that leverages haplotype-tagged long-read sequencing to accurately quantify per-locus somatic instability of tandem repeats, overcoming the limitations of short-read methods to serve as a biomarker for mismatch repair deficiency and a monitor of disease progression in repeat expansion disorders.

Kim, J.

Published 2026-03-18
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine your DNA is a massive library of instruction manuals. Most of these manuals are written perfectly, but some sections contain a strange quirk: a specific phrase or sentence is repeated over and over again, like a stutter.

In a healthy person, these repetitions are stable. But in certain diseases (like Huntington's) or in cancer, these repetitions start to glitch. They might get longer or shorter randomly as cells divide. This is called somatic instability. It's like a photocopier that keeps adding or deleting a few words every time it copies a page, eventually making the instructions unreadable.

The problem is that scientists have struggled to measure exactly how much these glitches are happening, especially when looking at specific pages (loci) in the library.

Enter MosaicTR, a new digital tool described in this paper. Think of it as a super-smart, noise-canceling microscope for reading these glitchy DNA sections.

Here is how it works, broken down into simple concepts:

1. The "Long-Read" Advantage: Reading the Whole Sentence

Old tools used "short-read" sequencing. Imagine trying to understand a long, repetitive sentence by only reading 3 words at a time. If the sentence is "The cat sat on the mat on the mat on the mat," and you only see "on the mat," you have no idea how many times it repeats. You might guess, or you might get confused by the "stutter" of the machine (PCR stutter).

MosaicTR uses long-read sequencing (like PacBio and Oxford Nanopore). This is like reading the entire sentence in one go. You can see exactly how many times the phrase repeats, even if it's very long.

2. The "Haplotype" Trick: Sorting the Twins

Humans have two copies of every gene (one from mom, one from dad). Sometimes, one copy is healthy, and the other is glitching.

  • The Problem: If you just mix all the DNA together, it's like having two different versions of a book in a pile. You can't tell which page belongs to which book.
  • The Solution: MosaicTR uses "haplotype tags" (HP tags). Think of these as color-coded sticky notes. It separates the "Mom copy" from the "Dad copy" so it can measure the glitching on each one individually. This is crucial because often, only one side is breaking down.

3. The "Noise Filter": Ignoring the Static

This is the paper's biggest innovation. Long-read machines are great, but they aren't perfect. They sometimes make tiny, random mistakes (like adding an extra letter or missing one) that look like a glitch but aren't.

  • The Analogy: Imagine trying to hear a whisper in a windy room. The wind (sequencing noise) makes it hard to know if the person is actually whispering a secret or if it's just the wind.
  • The Fix: MosaicTR knows the "rhythm" of the DNA. The DNA repeats come in specific blocks (motifs), like a 3-letter word.
    • If the machine makes a mistake that breaks the 3-letter rhythm (e.g., adding 1 letter), MosaicTR knows, "Ah, that's just wind noise," and ignores it.
    • If the change is a whole block (e.g., adding a full 3-letter word), MosaicTR knows, "That's a real glitch!" and counts it.
      This "rhythm filter" makes the tool incredibly accurate, cutting out false alarms.

4. The "Instability Score" (HII)

The tool gives you a score called the Haplotype Instability Index (HII).

  • Score near 0: The DNA is stable. It's a calm library.
  • High Score: The DNA is chaotic. The repetitions are growing or shrinking wildly. This tells doctors that a disease might be starting or that a tumor has a specific type of DNA repair failure.

5. Time Travel and Map Making

The tool isn't just for a single snapshot. It can:

  • Compare Time: Look at DNA from a patient at age 20 and again at age 40 to see how the glitches grew over time (longitudinal studies).
  • Compare Places: Look at DNA from the brain vs. the liver to see if the glitches are happening everywhere or just in one specific organ (tissue-specific).

Why Does This Matter?

  • For Disease: In diseases like Huntington's, the more the DNA glitches, the worse the symptoms. MosaicTR helps predict how fast a disease might progress.
  • For Cancer: Some cancers have broken DNA repair systems. MosaicTR can spot these broken systems early, acting as a "check engine light" for cancer cells.
  • For Research: It allows scientists to finally see the "mosaic" nature of our bodies—how different cells in the same person can have slightly different DNA instructions.

In short: MosaicTR is a new, high-precision ruler that can measure the tiny, chaotic changes in our DNA's repetitive sections, filtering out the background noise to give doctors and scientists a clear picture of disease progression and cancer risks.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →