Sassy2: Batch Searching of Short DNA Patterns

Sassy2 is a Rust-based tool that accelerates batch searching of short DNA patterns by distributing multiple patterns across SIMD lanes and employing a suffix-filtering strategy, achieving significant speedups over previous methods like Sassy1 and Edlib for tasks such as CRISPR guide RNA screening and barcode demultiplexing.

Original authors: Beeloo, R., Groot Koerkamp, R.

Published 2026-03-12
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a librarian in a massive, chaotic library (the human genome or a DNA sequencing machine). Your job is to find specific, short book titles (like barcodes, primers, or CRISPR guides) hidden inside millions of pages of text.

The problem is that the pages are messy. Some letters are smudged, some are missing, and some are extra. You aren't looking for a perfect match; you are looking for "close enough" matches (allowing for a few typos).

This is the job of Sassy2. Here is how it works, explained through simple analogies.

The Old Way: The "One-by-One" Search

Before Sassy2, there was a tool called Sassy1. Imagine Sassy1 is a very fast librarian who can read a whole page in one second. However, if you give them 100 different book titles to find, they have to:

  1. Scan the whole library for Title A.
  2. Scan the whole library for Title B.
  3. Scan the whole library for Title C... and so on.

Even though they are fast, doing this 100 times takes a long time. Also, if the library is small (a short DNA read), Sassy1 gets confused because it's built for huge libraries, not small ones.

The New Way: The "Super-Team" (Sassy2)

Sassy2 changes the game by using SIMD (Single Instruction, Multiple Data). Think of this not as one librarian, but as a super-team of 32 librarians standing in a row, all reading the same page at the exact same time.

Instead of searching for one title at a time, Sassy2 gives each librarian a different title to look for simultaneously.

  • Librarian 1 looks for Title A.
  • Librarian 2 looks for Title B.
  • Librarian 3 looks for Title C...

They all scan the text together. This is the Pattern Tiling. It's like having 32 pairs of eyes scanning the same wall for 32 different "Wanted" posters at once.

The Secret Sauce: The "Suffix Filter"

Here is the tricky part: Checking if a word matches perfectly takes time. If you have 32 librarians checking 32 different words against a page, and they all have to check every single letter, it's still slow.

Sassy2 uses a clever trick called Suffix Filtering.

Imagine you are looking for the word "Elephant" in a crowd. You don't need to check the whole word to know it's not an elephant. You just need to check the end of the word.

  • If the word ends in "apple," you know immediately it's not "Elephant."
  • You only need to do the full, detailed check if the word ends in "phant."

Sassy2 does this with DNA:

  1. The Quick Scan: It first looks at just the last few letters (the "suffix") of all 32 patterns. It does this super fast because it's checking a tiny piece of the puzzle.
  2. The Filter: If a piece of text doesn't match the "suffix" (e.g., the last 16 letters don't look right), Sassy2 instantly says, "Nope, not a match!" and moves on.
  3. The Deep Dive: Only if the suffix does look promising does the team stop and do the full, detailed check on the whole word.

This saves a massive amount of time because most random text in the library won't match the suffix. Sassy2 filters out the noise before doing the heavy lifting.

Why is this a big deal?

The paper shows that Sassy2 is incredibly fast:

  • Speed: It is 20 to 45 times faster than the current standard tools (like Edlib) and 2 to 5 times faster than its predecessor (Sassy1).
  • Short Reads: It works amazingly well on short pieces of text (like short DNA reads from a machine), where older tools were slow and clumsy.
  • Real World:
    • CRISPR: It can scan the entire human genome for 312 different "cutting guides" in seconds, helping scientists edit genes safely.
    • Nanopore: It can sort through millions of DNA samples to find specific "barcodes" (like sorting mail) in the blink of an eye.

The Catch

There is one small rule: Sassy2 works best when all the "book titles" you are looking for are the same length. If you have a mix of 20-letter words and 50-letter words, you have to sort them into separate piles first. But for most modern DNA tasks, this is a small price to pay for the incredible speed.

Summary

Sassy2 is like upgrading from a single detective searching a city block to a swarm of 32 drones scanning the whole city at once, using a quick-look filter to ignore 99% of the buildings before they even land. It makes finding tiny, slightly messy DNA patterns in massive amounts of data almost instantaneous.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →