SLAB: A Sweep Line Algorithm in PBWT for Finding Haplotype Block Cores

This paper introduces SLAB, an efficient sweep line algorithm based on PBWT that identifies width-maximal haplotype block cores to reveal population genetic insights and selection signals complementary to traditional IBD rate analyses.

Naseri, A., Sanaullah, A., Zhang, S., Zhi, D.

Published 2026-03-18
📖 6 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Picture: Finding the "Heart" of Shared DNA

Imagine you have a massive library containing the genetic "cookbooks" of nearly one million people (the UK Biobank). Each cookbook is a long list of instructions (DNA) written in a code of four letters (A, C, G, T).

Scientists know that people often share chunks of these instructions because they share ancestors. These shared chunks are called haplotype blocks. Think of them like stamps on a letter. If you and your cousin both have the same rare stamp on your letter, it means you likely got that letter from the same great-grandparent.

The Problem:
When you look at a million people, you don't just find simple pairs of cousins. You find huge, messy groups of people sharing different parts of the same stamp.

  • Group A shares a stamp from position 1 to 10.
  • Group B shares a stamp from position 5 to 15.
  • Group C shares a stamp from position 8 to 20.

These groups overlap in confusing ways. Some overlaps are small (just two people), while others are massive (thousands of people). The researchers wanted to find the "Block Cores."

The Analogy:
Imagine a crowded room where everyone is holding a long strip of colored tape.

  • Some strips are red, some are blue.
  • The strips overlap on the floor.
  • In some spots, only two strips cross.
  • In other spots, a huge pile of 50 strips is all stacked on top of each other.

The "Block Core" is that specific spot on the floor where the maximum number of strips are stacked directly on top of one another. It's the "heart" of the overlap. Finding this heart tells us where the most significant shared ancestry or evolutionary event happened.


The Challenge: The Needle in a Haystack (But the Haystack is Moving)

Finding these "stacks" is incredibly hard. If you have 1 million people, the number of possible combinations is astronomical. It's like trying to find the exact moment in a 24-hour movie where the most people in the audience are holding up their hands at the same time, but the movie is 100,000 frames long, and the "hands" (DNA segments) start and stop at random times.

Traditional methods are too slow. They try to check every single pair of people against every other pair. This would take centuries.

The Solution: The "Sweep Line" Algorithm (SLAB)

The authors created a new tool called SLAB (Sweep Line Algorithm in PBWT). Here is how it works, using a metaphor:

The Metaphor: The Moving Spotlight
Imagine a giant, horizontal laser beam (the "Sweep Line") moving slowly from the left side of a stage to the right side.

  • On the stage, there are thousands of people (haplotypes) holding up signs (DNA blocks) that say "I am here from mile 10 to mile 20."
  • As the laser beam moves, it lights up the people currently standing in its path.
  • The Trick: Instead of looking at everyone, the algorithm uses a special sorting trick (called PBWT) that organizes the people on stage so that those holding similar signs are standing right next to each other in a line.

How SLAB finds the "Core":

  1. The Start: When the laser hits the start of a sign, it adds that person to its "active list."
  2. The Scan: As the laser moves, it checks the "active list." Because the people are sorted by their DNA patterns, the algorithm can instantly see: "Hey, look at this group of 500 people standing right next to each other in the line. They all have a sign that covers this exact spot."
  3. The End: When the laser hits the end of a sign, it removes that person from the list.
  4. The Discovery: By doing this sweep, the computer instantly spots the "stacks" (the cores) where the most signs overlap, without having to check every single person against every other person.

It's like using a metal detector that is so smart it only beeps when it finds a pile of gold coins, ignoring the single loose coin here and there.

What Did They Find? (The Results)

When the researchers ran this algorithm on the UK Biobank data, they found some fascinating "hearts" of shared DNA:

  1. The Immune System (Chromosome 6): They found the biggest "stack" of shared DNA in the part of the genome that controls our immune system (the MHC region). This makes sense because our immune systems need to be diverse, but certain ancient defenses are shared by huge groups of people.
  2. The Neanderthal Connection (Chromosome 3): They found a massive overlap in a gene called SLC6A20. This gene is linked to severe COVID-19. The "stack" here is made up of DNA that humans inherited from Neanderthals thousands of years ago. The algorithm showed that while many people have this Neanderthal DNA, the specific "core" group is distinct.
  3. Lactose Tolerance (Chromosome 2): They found a spot where many people share DNA related to digesting milk (lactose). This is a classic sign of natural selection: humans who could drink milk survived better, so their DNA spread like wildfire, creating a huge "stack" of shared blocks.

Why Does This Matter?

1. It's Faster:
Before this, analyzing this much data would take a supercomputer weeks. SLAB does it in hours. It's like switching from counting grains of sand one by one to using a sieve.

2. It Finds Hidden Patterns:
Old methods looked at how many people share any DNA (IBD rates). But SLAB looks at the structure of the sharing.

  • Analogy: Imagine a party. Old methods count how many people are wearing red hats. SLAB looks for the specific group of people wearing red hats and blue shoes and holding a specific drink. It finds the "cliques" that matter most.

3. It Helps Find Disease Genes:
By pinpointing exactly where these "stacks" of shared DNA are, scientists can narrow down the search for genes that cause diseases. If a "stack" of shared DNA is found in people with a specific disease, that spot on the DNA is a prime suspect.

Summary

The paper introduces SLAB, a super-fast, smart way to find the "center of gravity" in shared human DNA. By using a moving laser beam (Sweep Line) and a clever sorting trick (PBWT), it can sift through millions of genetic profiles to find the most important shared chunks. This helps us understand our evolutionary history (like Neanderthal DNA) and find the genetic roots of modern diseases.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →