LAML-Pro: Maximum Likelihood Inference of Cell Genotypes and Cell Lineage Trees

LAML-Pro is a novel maximum likelihood algorithm that jointly infers cell genotypes and lineage trees from noisy single-cell data, significantly reducing genotype errors and improving tree accuracy compared to existing two-step methods.

Chu, G., Schmidt, H., Raphael, B.

Published 2026-03-31
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to reconstruct the family history of a massive, sprawling family reunion that took place inside a petri dish. You have thousands of relatives (cells), and you want to draw a family tree showing who is related to whom and how they are connected.

In the past, scientists had a two-step process to do this, but it was like trying to solve a mystery with a blurry photograph:

  1. Step 1 (The Blurry Photo): They looked at the cells and tried to guess their "genetic ID cards" (genotypes). Because the technology (like taking pictures of glowing cells) isn't perfect, they often made mistakes. Some IDs were blurry, some were missing, and some were just wrong.
  2. Step 2 (The Wrong Tree): They took these flawed ID cards and tried to build a family tree. But if you feed a computer bad data, it builds a bad family tree. It might think two cousins are actually twins, or that a stranger is part of the family.

Enter LAML-Pro: The "Super-Detective" Algorithm

The paper introduces a new tool called LAML-Pro. Instead of doing the two steps separately, LAML-Pro does them at the same time. It acts like a super-detective who doesn't just look at the blurry photo and guess the ID; it looks at the photo and the family tree together, constantly adjusting both until they make perfect sense.

Here is how it works, using some everyday analogies:

1. The "Broken Puzzle" Analogy

Imagine you have a giant jigsaw puzzle (the family tree), but many pieces are missing, and the ones you have are smudged or upside down.

  • Old Method: You try to clean the smudges first (guess the genotypes). If you clean them wrong, the puzzle pieces won't fit together later, and you end up with a broken picture.
  • LAML-Pro: It holds the puzzle pieces and the picture of the final image in its mind simultaneously. If a piece looks like it should be a blue sky but the smudge makes it look like a green tree, LAML-Pro says, "Wait, if I put this piece here, the whole picture makes more sense. Let's assume the smudge was just a trick of the light." It fixes the smudge while building the picture.

2. The "Noisy Classroom" Analogy

Imagine a teacher trying to figure out which students are sitting next to each other based on a noisy recording of their voices.

  • The Problem: The recording is full of static (noise). Sometimes a student's voice is too quiet to hear (missing data), and sometimes the static makes a "hello" sound like "hullo."
  • The Old Way: The teacher writes down what they think they heard, then tries to arrange the students. If they misheard a word, they put the wrong students together.
  • LAML-Pro: The teacher listens to the whole room at once. They realize, "If Student A is sitting next to Student B, they would likely say similar things. Even though the recording of Student A is fuzzy, the fact that Student B is clear helps me guess what Student A actually said." By using the context of the whole group, it cleans up the noise.

3. The "Magic Eraser" for Mistakes

One of the biggest problems with imaging cells (taking pictures of them) is that the data is often "uncertain." It's like looking at a fingerprint in the rain; you see a shape, but you aren't 100% sure.

  • Old Methods: They would throw away the fuzzy fingerprints or guess randomly. This led to a lot of errors (up to 50% in some cases!).
  • LAML-Pro: It uses a special mathematical model (called PMMO) that understands why the data is fuzzy. It knows that sometimes a cell just "forgot" to show its ID (a dropout) or that the camera was too dim. Instead of giving up, it uses the surrounding clues to fill in the blanks.
    • The Result: It reduced errors from a messy 25-50% down to a tiny 0.03%—basically making the blurry photos as clear as a high-definition scan.

Why Does This Matter?

In biology, knowing the family tree of cells is crucial for understanding how diseases like cancer grow or how a baby develops from a single cell.

  • Before: Scientists were building family trees on shaky ground, leading to wrong conclusions about how cells divide and move.
  • Now: With LAML-Pro, they can build a solid, accurate tree even when the data is messy. It's like upgrading from a sketch drawn in pencil to a high-definition 3D map.

In a Nutshell:
LAML-Pro is a smart computer program that stops trying to "clean the data" before "building the tree." Instead, it cleans the data while building the tree, using the logic of the whole family to fix the mistakes of the individual members. This allows scientists to see the true history of cell life, even when the evidence is fuzzy, missing, or confusing.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →