PlanktonFlow : hands-on deep-learning classification of plankton images for biologists

PlanktonFlow is an open-source, user-friendly Python pipeline designed to empower biologists with limited deep-learning expertise to automate the pre-processing, training, optimization, and inference of high-performance convolutional neural networks for plankton image classification, demonstrating superior accuracy over existing tools like EcoTaxa.

Walter, H., Gorzerino, C., Collinet, M., Porcon, B., Martignac, F., Edeline, E.

Published 2026-03-25
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are a biologist trying to count and identify tiny, floating creatures called plankton in a river. In the past, you would have to look at thousands of photos under a microscope, one by one, like a librarian sorting through a massive pile of books. This is slow, tiring, and prone to human error.

Now, imagine you have a high-speed camera (like a FlowCAM) that takes millions of photos of these plankton in seconds. Suddenly, you have a library with millions of books, but no time to read them all. You need a robot assistant to sort them for you.

This is where PlanktonFlow comes in.

The Problem: The "Black Box" of AI

Scientists have developed powerful "robot brains" (Deep Learning models) that are incredibly good at recognizing images. However, using them is like trying to build a rocket ship with a manual written in a foreign language. It requires complex coding skills, expensive computer parts, and a lot of trial and error. Most biologists just want to study nature, not become computer engineers.

The Solution: PlanktonFlow (The "Plug-and-Play" Toolkit)

The authors of this paper built PlanktonFlow, which is like a Swiss Army Knife for plankton photos. It's a free, open-source tool that guides a biologist through the entire process of teaching a computer to recognize plankton, without needing to be a coding wizard.

Here is how it works, step-by-step, using simple analogies:

1. Cleaning the Mess (Pre-processing)

Imagine your photos are a bit messy. Some have little rulers (scale bars) drawn on them that distract the robot, and some types of plankton are very rare (like finding a needle in a haystack), while others are common (like finding hay).

  • What PlanktonFlow does: It automatically erases the distracting rulers. It also uses a clever trick called "data augmentation." If it only sees 10 photos of a rare plankton, it creates 1,500 slightly different versions of those photos (flipping them, changing the brightness, rotating them) so the robot gets enough practice to recognize them. It balances the deck so the robot doesn't just learn to recognize the common ones.

2. The Cooking Class (Training)

Now you need to teach the robot. PlanktonFlow offers four different "chefs" (AI models) to try:

  • ResNet, DenseNet, EfficientNet, and YOLO.
    Think of these as four different students. One is fast but maybe less accurate; another is slow but very thorough.
  • The Magic: Instead of you guessing which student is best, PlanktonFlow runs a "cooking competition." It trains all four chefs on your specific photos. It tweaks their settings (like the temperature of the oven or the amount of salt) automatically to find the perfect recipe for your specific data.

3. The Taste Test (Evaluation)

Once the chefs are trained, PlanktonFlow puts them to the test with a new set of photos they've never seen before.

  • It measures who got the most answers right.
  • Crucially, it checks if they are good at identifying the rare plankton too, not just the common ones.
  • The Winner: In this study, a chef named EfficientNet-B5 won the competition. It was the most accurate at identifying everything, from the common to the rare.

4. The Comparison (Beating the Old Standard)

The researchers also tested PlanktonFlow against EcoTaxa, the current standard tool used by most biologists.

  • The Result: EcoTaxa is a decent tool, like a reliable but old-fashioned calculator. But PlanktonFlow's AI models were like a supercomputer. The best AI model (EfficientNet) was significantly more accurate than EcoTaxa, especially when identifying tricky or rare species.

Why This Matters

Before PlanktonFlow, if a biologist wanted to use the best AI, they had to hire a data scientist or spend months learning to code. PlanktonFlow removes that barrier.

  • It's Modular: You can swap out the "chefs" or the "ingredients" easily.
  • It's Transparent: You can see exactly how the robot learned.
  • It's Scalable: Whether you have 1,000 photos or 1,000,000, the system handles it.

The Bottom Line

PlanktonFlow is like giving every biologist a self-driving car for their data. Instead of struggling to steer the complex machinery of Artificial Intelligence, they can just hop in, press "Start," and let the system handle the heavy lifting of sorting, learning, and identifying plankton. This frees them up to do what they do best: understanding the health of our oceans and rivers.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →