AnnotateAnyCell: Open-Source AI Framework for Efficient Annotation in Digital Pathology

The paper introduces AnnotateAnyCell, an open-source semi-supervised framework that integrates active contrastive learning and human-in-the-loop feedback to reduce annotation time by 25% while achieving expert-level accuracy for cellular features in digital pathology.

Original authors: Verma, S., Malusare, A., Wang, M., Wang, L., Mahapatra, A., English, A. L., Cox, A. D., Broman, M., de Brot, S., Burcham, G., Knapp, D., Dhawan, D., Sola, M., Aggarwal, V., Grama, A., Lanman, N. A.

Published 2026-04-13
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a master art critic trying to sort through a library containing millions of tiny, blurry photographs of cells. Your job is to look at each one and decide: "Is this cell dividing? Does it have a weird shape? Is the nucleus (the cell's brain) healthy or sick?"

Doing this manually for a whole slide of tissue is like trying to read every single book in a library to find a few specific sentences. It takes experts (pathologists) hundreds of hours, it's exhausting, and it's the biggest bottleneck stopping AI from helping doctors diagnose cancer faster.

Enter "AnnotateAnyCell."

Think of this new tool not as a robot that does the work for you, but as a super-smart, interactive librarian that helps you find the most important books to read first.

Here is how it works, broken down into simple concepts:

1. The Problem: The "Needle in a Haystack"

In digital pathology, a single image (Whole Slide Image) is so huge it contains hundreds of thousands of cells. If a human has to look at every single one to train an AI, it's impossible.

  • The Old Way: You read the books in order, from page 1 to page 1,000,000. It takes forever.
  • The New Way (AnnotateAnyCell): You ask the librarian, "Show me the books that look most different from the ones I've already read."

2. The Magic Trick: The "Morphological Map" (UMAP)

The tool uses a special kind of math to create a 2D map of all the cells.

  • Imagine you have a giant pile of mixed-up LEGO bricks.
  • The tool sorts them on a table so that all the red bricks are in one pile, blue bricks in another, and weird-shaped bricks in a third.
  • In the real world, this means cells that look similar (e.g., cells with "bubbly" nuclei) naturally group together on the screen.
  • Why this helps: Instead of scanning the whole library, the pathologist just looks at one "pile" of similar cells and says, "Yes, these are all sick cells." The AI then learns that everything in that pile is likely sick too.

3. The "Guessing Game" (Active Learning)

This is the "Human-in-the-Loop" part.

  1. The AI guesses: It looks at the unlabeled cells and makes a guess: "I think this cell is dividing."
  2. The Human checks: The pathologist looks at the guess. If the AI is right, great! If the AI is wrong, the human corrects it.
  3. The Smart Selection: The AI doesn't just ask for random corrections. It specifically asks for the cells it is unsure about or the ones that are rare.
    • Analogy: Imagine you are teaching a child to identify dogs. You don't show them 100 Golden Retrievers. You show them a Golden Retriever, then a Chihuahua, then a weird-looking mutt. You focus on the "hard" examples to teach them faster.

4. The Results: Speed and Accuracy

The researchers tested this on dog cancer samples (which are very similar to human bladder cancer).

  • Time Saved: Using this tool, experts finished labeling 300 cells in 47 minutes. Doing it the old, sequential way took 63 minutes. That's a 25% time savings.
  • Accuracy:
    • For spotting nucleoli (tiny structures inside the nucleus), the AI was 98% accurate.
    • For spotting mitotic figures (cells dividing), it was 96% accurate.
    • For cell shape, it was lower (around 60%), but that's because "shape" is very subjective even for humans.
  • The "Small Sample" Miracle: The AI learned to recognize nucleoli with 95% accuracy after seeing only 215 examples. Usually, AI needs thousands of examples to learn this well.

5. Why This Matters

Currently, AI in medicine is stuck because we don't have enough "labeled" data (data where a human has already said, "This is cancer").

  • AnnotateAnyCell is like a force multiplier. It allows a pathologist to do the work of three people in the same amount of time.
  • It is Open Source, meaning any hospital or university can download it for free, unlike expensive commercial software.
  • It builds trust. Because the human is always in the loop checking the AI's work, doctors are more likely to trust the final diagnosis.

The Bottom Line

AnnotateAnyCell is a smart assistant that organizes the chaos of millions of cells into neat, understandable groups. It asks the human expert only the questions they need to answer to learn the most, turning a months-long task into a few hours, and paving the way for AI to help diagnose cancer in hospitals that don't have huge budgets or armies of data scientists.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →