Structure from Noise: Confirmation Bias in Particle Picking in Structural Biology

This paper demonstrates that both template matching and deep neural network-based particle picking methods in cryo-EM and cryo-ET can generate persistent molecular structures from pure noise due to confirmation bias, thereby highlighting a critical vulnerability in current workflows and proposing mitigation strategies to ensure data integrity.

Balanov, A., Zabatani, A., Bendory, T.

Published 2026-04-12
📖 6 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

The Big Idea: "Seeing What You Want to See" in the Microscope

Imagine you are a detective trying to find a specific type of rare coin (a protein) hidden inside a massive, chaotic pile of sand and pebbles (the microscopic image). The coins are tiny, and the sand is very noisy.

To find the coins, you have a template: a perfect drawing of what the coin should look like. You scan the pile, looking for anything that matches your drawing.

The Problem: The paper argues that if the pile is actually just pure sand (noise) with no coins at all, your search method might still "find" coins. Worse, the "coins" you find will look exactly like your drawing, even though they are just random sand grains that happened to line up by chance.

This is Confirmation Bias: Your brain (or computer algorithm) is so eager to find the coin that it interprets random noise as the coin you were looking for.


The "Einstein from Noise" Analogy

The paper starts with a famous thought experiment called "Einstein from Noise."

Imagine you have a photo of Albert Einstein. Now, imagine you take a bag of pure static (white noise) and try to align it to look like Einstein. If you force the noise to align with Einstein's face, and then average thousands of these attempts, you will eventually get a picture that looks like Einstein.

  • Old Understanding: Scientists knew this could happen after they had already picked the particles. If you force-align noise to a template, you get a fake structure.
  • New Discovery (This Paper): The authors found that the bias happens much earlier, right at the very first step: Particle Picking.

The "Gold Rush" Analogy

Think of the particle-picking stage as a gold rush.

  • The Prospectors (Algorithms): They are scanning the land (the microscope image) for gold (proteins).
  • The Map (The Template): They have a map showing where gold might be.
  • The Noise: The ground is actually just dirt. There is no gold.

How the Bias Happens:
The prospectors use a metal detector tuned to the shape of a gold nugget. They scan the dirt. Occasionally, a random clump of dirt happens to look a little bit like a nugget on the detector. Because the detector is set to be very sensitive (low threshold), the prospectors dig up that clump of dirt.

They do this thousands of times. They take all the "dirt nuggets" they found, wash them, and stack them up to see what they look like.

  • The Result: Because they only dug up dirt that looked slightly like gold, the pile of dirt they are left with starts to look exactly like a gold nugget.
  • The Illusion: They conclude, "Look! We found gold!" But they actually just found a pile of dirt that was filtered to look like gold.

The "Sieve" Metaphor

Imagine you have a sieve (a filter) with holes shaped like a star.

  1. You pour a bucket of random gravel (noise) through the sieve.
  2. Most gravel falls through.
  3. But some random pieces of gravel happen to be shaped just right to get stuck in the star-shaped holes.
  4. You collect the stuck gravel.
  5. If you look at the pile of stuck gravel, it doesn't look like random gravel anymore. It looks like a pile of stars.

The paper proves mathematically that the shape of the sieve (the template) dictates the shape of the final pile, even if the original bucket contained no stars at all.

Why This Matters for Science

In Cryo-EM (a technique to see tiny viruses and proteins), scientists often use these "templates" to find particles in very blurry, noisy images.

  • The Danger: If a scientist is looking for a specific virus shape, and they use a template of that virus to pick particles, the computer might "find" that virus even if the sample is empty or contains a different virus.
  • The Consequence: The final 3D model they build might look like the template they started with, not the actual virus in the sample. They might publish a picture of a virus that doesn't exist, simply because their computer was biased to find it.

The "Topaz" Twist (Deep Learning)

The paper also tested modern AI tools (like Topaz) that learn to find particles by looking at training data.

  • The Finding: Even AI is not immune. If you train an AI on pictures of Ribosomes (a type of cell machine) and then ask it to look at pure noise, it will still "find" Ribosomes in the noise.
  • The Lesson: The AI learns the shape of the training data so well that it hallucinates that shape even when it's not there.

How to Fix It?

The paper suggests a few ways to stop this "seeing ghosts" problem:

  1. Raise the Bar: Don't be too eager to pick particles. Set a higher threshold so you only pick things that are definitely particles, not just things that look kind of like the template.
  2. Use a Generic Net: Instead of using a specific shape (like a Ribosome), use a generic "blob" detector first. This avoids imposing a specific shape on the data too early.
  3. Check the Noise: Run the process on pure noise. If your computer finds a structure in the noise, your method is broken and biased.

Summary

This paper is a warning label for structural biology. It says: "Be careful what you look for, because you might find it even if it isn't there."

The computer algorithms used to find tiny biological structures are so good at matching templates that they can turn random static noise into a perfect-looking 3D structure. It's a mathematical proof that confirmation bias isn't just a human flaw; it's a built-in feature of how these machines search for patterns.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →