CHAMMI-75: Pre-training multi-channel models with heterogeneous microscopy images

The paper introduces CHAMMI-75, a diverse open-access dataset of 75 heterogeneous multi-channel microscopy studies, which enables the training of channel-adaptive machine learning models to overcome the limitations of specialized, single-modality approaches in quantifying cellular morphology.

Vidit Agrawal, John Peters, Tyler N. Thompson, Mohammad Vali Sanian, Chau Pham, Nikita Moshkov, Arshad Kazi, Aditya Pillai, Jack Freeman, Byunguk Kang, Samouil L. Farhi, Ernest Fraenkel, Ron Stewart, Lassi Paavolainen, Bryan A. Plummer, Juan C. Caicedo

Published 2026-03-04
📖 5 min read🧠 Deep dive

The Big Problem: The "One-Size-Fits-None" Dilemma

Imagine you are a detective trying to solve crimes by looking at photos of suspects.

  • The Old Way: In the past, if you wanted to identify a suspect, you had to hire a different detective for every type of photo. One detective only looked at black-and-white photos. Another only looked at color photos. A third only looked at photos taken at night. If a new case came in with a photo that was a mix of night-vision and color, none of your detectives could help you. You'd have to fire them all and hire a new specialist.
  • The Reality of Microscopy: In biology, scientists use microscopes to take pictures of cells. But unlike your phone camera (which always takes 3-channel color photos), microscopes are weird. Some take 1 channel, some take 5, some take 14, and some take 7. Each channel is like a different "lens" showing a different part of the cell (like the nucleus, the skeleton, or the energy factories).
  • The Result: Because the number of "lenses" (channels) changes so much, scientists had to build a new AI model for every single experiment. These models were like the specialized detectives: great at one thing, but useless at everything else. They couldn't learn from each other.

The Solution: CHAMMI-75 (The "Universal Library")

The authors of this paper decided to build a massive, universal training ground for AI. They call it CHAMMI-75.

Think of CHAMMI-75 as a giant, chaotic, but incredibly rich library of cell photos.

  • The Collection: They didn't just grab photos from one lab. They went out and collected 2.8 million images from 75 different scientific studies all over the world.
  • The Variety: This library is messy in a good way. It has photos of cells from humans, mice, and plants. It has photos taken with 100 different types of microscopes. It has photos with 1 channel, 2 channels, up to 14 channels.
  • The Goal: They wanted to feed this "chaos" to an AI so the AI could learn the true language of cells, regardless of how the photo was taken. They wanted an AI that says, "I don't care if you give me a 3-channel photo or a 14-channel photo; I know what a cell looks like."

The Training: Teaching the AI to be a "Polyglot"

To train this AI, they used a method called Self-Supervised Learning.

  • The Analogy: Imagine teaching a child to recognize a dog. Instead of showing them a picture and saying, "This is a dog," you just show them thousands of pictures of dogs, cats, and birds and let them figure out the patterns themselves. They learn that "ears" and "fur" usually go together, even if they don't know the word "dog" yet.
  • The Process: The AI looked at millions of these diverse cell images. It learned to ignore the "noise" (like the specific microscope used or the lighting) and focus on the "signal" (the actual shape and structure of the cell).

The Star Player: MorphEm

The result of this training is a new AI model they named MorphEm (short for Morphology Embeddings).

  • What it does: MorphEm is like a universal translator. If you give it a photo with 3 channels, it understands it. If you give it a photo with 14 channels (which no other model could handle before), it understands that too.
  • The Test: They put MorphEm through a series of "final exams" (benchmarks) using real-world biological problems, like:
    • Identifying which drugs kill cancer cells.
    • Detecting genetic diseases.
    • Classifying blood cells from different countries.
  • The Result: MorphEm didn't just pass; it crushed the competition. It performed better than models that were specifically trained for just one type of image. It proved that diversity is strength. By seeing more types of data, the AI became smarter and more adaptable.

Why This Matters (The "So What?")

Before this paper, if a scientist wanted to study a new type of cell with a weird new microscope setup, they had to start from scratch. They had to collect data, label it, and train a new model from zero.

With CHAMMI-75 and MorphEm:

  1. Reusability: Scientists can now use this pre-trained "brain" for almost any new experiment.
  2. Speed: They don't need to wait years to train a new model. They just plug in their new data, and the AI is ready to go.
  3. Discovery: Because the AI is so good at spotting subtle differences, it can help scientists find new drugs or understand diseases faster than ever before.

In a Nutshell

Imagine trying to learn a language.

  • The Old Way: You learned French, then you learned Spanish, then you learned German, but you had to forget one to learn the next.
  • The New Way (CHAMMI-75): You were thrown into a room with speakers of 75 different languages speaking at once. You learned the structure of language itself. Now, when someone speaks a language you've never heard before, you can still understand the grammar and meaning.

This paper gives the scientific community that "universal language" for cell biology, allowing them to solve problems faster and more accurately than ever before.