SHOT-CCR: Biologically guided adversarial training for test-time adaptation in cellular morphology

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to teach a computer to recognize different types of fruit just by looking at photos. You show it thousands of pictures of apples, oranges, and bananas. But here's the catch: every time you take a new batch of photos, you do it in a different kitchen, with different lighting, and maybe even a slightly different camera angle.

To the computer, an apple taken in "Kitchen A" looks completely different from an apple taken in "Kitchen B," even though they are the same fruit. The computer gets confused and starts thinking the lighting or the table color is the most important thing, rather than the fruit itself. This is exactly the problem scientists face with Cell Painting, a technique used in drug discovery where they take high-tech microscope photos of cells to see how they react to new medicines.

The Problem: The "Kitchen" Effect

In the world of biology, these "kitchens" are called batches. When scientists run experiments, they do them in groups (batches). Sometimes, one batch is done on a Monday, another on a Friday. Maybe the temperature was slightly different, or the chemicals were mixed by a different person.

These tiny technical differences create "batch effects." They act like a fog that hides the real biological signal. A computer model might learn to predict the day of the week the photo was taken instead of the drug effect on the cell. This is a huge problem because if a model only works on the specific batch it was trained on, it's useless for discovering new drugs in the real world.

The Solution: SHOT-CCR (The "Smart Filter")

The authors of this paper created a new method called SHOT-CCR. Think of it as a super-smart filter that helps the computer ignore the "kitchen noise" and focus on the "fruit."

Here is how it works, using a simple analogy:

1. The "Cell Count" Clue

In these microscope photos, one of the easiest things for a computer to count is how many cells are in the picture.

The Problem: Sometimes, Batch A has crowded photos (lots of cells), and Batch B has sparse photos (few cells). The computer gets lazy and starts guessing the drug type based on how crowded the picture is, rather than looking at the actual shape of the cells.
The Fix (Cell Count Reversal): The authors taught the computer a trick. They said, "Hey, I know you can easily count the cells, but I'm going to punish you if you use that number to guess the drug." They used a technique called Adversarial Training. Imagine a game where the computer tries to guess the drug, but a "referee" (the adversarial part) yells "Wrong!" every time the computer relies too much on the cell count. This forces the computer to look deeper and find the real biological clues.

2. The "Test-Time Adaptation" (The "Practice Run")

Usually, you train a model once and then lock it. But in the real world, new data keeps coming in.

The Fix (SHOT): The authors let the model take a "practice run" right before it makes a final decision on new data. It looks at the new batch of images, adjusts its internal settings slightly (like tuning a radio to get a clearer signal), and then makes its prediction. It doesn't need to be retrained from scratch; it just adapts on the fly to the new "kitchen" conditions.

Why This Matters: The Results

The team tested this on two massive datasets containing millions of cell images (RxRx1 and JUMP-CP).

The Old Way: The previous best method (called AdaBN) got about 87% of the answers right.
The New Way (SHOT-CCR): Their method got 91.6% right.

That might sound like a small number, but in the world of AI and drug discovery, that's a massive leap. It means they are correctly identifying the effects of genetic changes in cells much more reliably.

The "U2OS" Surprise:
One specific type of cell (called U2OS) was notoriously hard for computers to learn. The old method only got 68% right on these. The new method boosted this to 76%. This is huge because it means the AI is finally getting good at the "hard" cases, not just the easy ones.

The Big Picture

Think of this research as teaching a student to ignore the noise of the classroom (the batch effects) and focus entirely on the lesson (the biology).

By specifically targeting cell count as a distraction and letting the model adapt on the fly, the authors have created a more robust tool. This means:

Better Drug Discovery: Scientists can trust the AI more when it says a drug might work.
Mixing Data: They can now combine data from different labs and different times without the results getting messy.
Real-World Application: It moves us closer to a future where AI can reliably help find cures for diseases, regardless of where or when the data was collected.

In short, SHOT-CCR is like giving the computer a pair of noise-canceling headphones so it can finally hear the true voice of the cell.

1. Problem Statement

Batch Effects in High-Content Screening:
In drug discovery, High-Content Screening (HCS) datasets (specifically "Cell Painting" microscopy data) are plagued by pervasive batch effects. These are technical artifacts arising from differences in experimental batches (e.g., different days, labs, or equipment) that obscure the underlying biological signals.

The Challenge: Models trained on one set of experimental batches often fail to generalize to unseen batches, even if they perform well "within-batch."
Specific Confounder: The authors identify cell count (the number of cells per image) as a fundamental biological feature that varies significantly across batches and cell types. Models often over-rely on cell count as a proxy for batch identity rather than learning the true morphological response to genetic perturbations.
Limitation of Previous Work: Prior approaches, such as Adaptive Batch Normalization (AdaBN) and generic gradient reversal (attempting to remove all batch identity), have shown mixed results. Specifically, generic batch reversal can strip away task-relevant biological signal, and AdaBN struggles with cell types that have limited training data (e.g., U2OS).

2. Methodology: SHOT-CCR

The authors propose SHOT-CCR (Source Hypothesis Transfer with Cell Count Reversal), a framework combining Test-Time Adaptation (TTA) with biologically guided adversarial training.

A. Model Architecture

Backbone: Uses a pre-trained DenseNet-161 (modified to accept 5 or 6-channel Cell Painting images).
Heads:
1. Perturbation Classifier: Predicts the genetic perturbation class (siRNA or CRISPR).
2. Cell Count Regression Head: Predicts the normalized cell count of the image.
3. Gradient Reversal Layer (GRL): Placed between the feature extractor and the cell count head. It multiplies the gradients flowing back from the cell count loss by a negative factor ( $-\alpha$ ).

B. Training Phase (Biologically Guided Adversarial Training)

Goal: To force the feature extractor to learn representations that are invariant to cell count differences without becoming completely agnostic to cell count (which would hurt performance).
Mechanism: The model minimizes the perturbation classification loss while simultaneously maximizing the cell count regression loss (via the GRL). This encourages the encoder to discard cell-count-specific batch noise while retaining biological signal.
Hyperparameters: Crucially, the authors use separate learning rates and a specific reversal strength ( $\alpha$ ) for the cell count head. They found that "partial invariance" (stripping noise but keeping some signal) is optimal; total removal of cell count information degrades classification.

C. Test-Time Adaptation (TTA)

During inference on unseen batches, the model adapts without access to ground-truth labels:

Freezing: The perturbation classifier (hypothesis) is frozen.
Unsupervised Optimization: The feature extractor is updated using SHOT loss, which consists of:
- Entropy Minimization ( $L_{ent}$ ): Reduces prediction uncertainty.
- Diversity Loss ( $L_{div}$ ): Prevents the model from collapsing predictions into a single class.
- Pseudo-Labeling ( $L_{pc}$ ): Uses high-confidence predictions (threshold $\beta=0.95$ ) as pseudo-labels to refine the feature extractor.
Process: The model iterates over the target test batch, updating feature extractor weights to align with the new batch distribution while maintaining the biological signal learned during training.

3. Key Contributions

Biologically Informed TTA: Extends computer vision TTA techniques to Cell Painting by incorporating a domain-specific biological prior (cell count) into the adaptation process.
Cell Count Adversarial Training (CCR): Introduces a novel mechanism that selectively down-weights cell-count-driven batch differences. Unlike generic batch reversal, CCR targets a specific, biologically relevant confounder, proving more effective than attempting to remove all batch identity.
Comprehensive Benchmarking: Establishes a new state-of-the-art benchmark across two major datasets (RxRx1 and JUMP-CP) and four distinct cell types, demonstrating robustness where previous methods failed.

4. Results

The method was evaluated on RxRx1 (1,139 siRNA perturbations, 4 cell types) and a subset of JUMP-CP (484 CRISPR perturbations, U2OS cells).

Performance Metrics

RxRx1:
- Baseline (AdaBN): 87.1% accuracy.
- SHOT-CCR: 91.6% accuracy.
- Improvement: +4.5% over the previous state-of-the-art (statistically significant, $p < 0.0001$ ).
- Cell Type Specifics: The largest gain was in the U2OS cell type (+8.0%), which previously had the lowest performance (68.2%) due to limited training data and high batch variance.
JUMP-CP (Subset):
- Baseline: 10.4% accuracy.
- SHOT-CCR: 43.7% accuracy.
- Improvement: +15.7% over the baseline.
- Note: In JUMP-CP, cell count distributions were more uniform across batches, so the specific gain from CCR was marginal compared to RxRx1, confirming that CCR's benefit scales with cell count heterogeneity.

Ablation Studies & Insights

Gradient Reversal Specificity: Generic "Batch Identity" gradient reversal decreased performance (dropping accuracy by ~4%), confirming that removing all batch info destroys biological signal. Targeting only cell count (CCR) was effective.
Data Distribution: The authors found that batches with highly divergent cell count distributions (e.g., specific U2OS and HepG2 batches) were the primary drivers of failure. SHOT-CCR specifically improved performance on these difficult, distribution-shifted batches.
Biological Validity: Gene set enrichment analysis confirmed that the accuracy improvements were concentrated in genes related to nucleolar morphology, RNA helicases, and endomembrane systems—biological pathways known to produce subtle phenotypes often obscured by batch effects.

5. Significance

Robust Drug Discovery: By enabling models to generalize across unseen experimental batches and cell types, SHOT-CCR facilitates the creation of more scalable and reliable AI models for drug discovery.
Methodological Shift: The paper argues against "blind" batch correction. Instead, it advocates for biologically guided adversarial training, where specific, known confounders (like cell count) are targeted while preserving the complex biological signal necessary for classification.
Dataset Composition: The study highlights that the distribution of biological indicators (like cell count) across train/test splits is critical. Researchers are advised to inspect these distributions to ensure models are not learning technical artifacts.
New Benchmark: The authors provide a refined subset of the JUMP-CP dataset as a new benchmark for future batch correction research.

In summary, SHOT-CCR demonstrates that combining test-time adaptation with a targeted adversarial mechanism to suppress specific biological confounders (cell count) significantly outperforms existing batch correction methods, particularly in scenarios with limited data or high technical variability.