Descriptor: Dataset of Parasitoid Wasps and Associated Hymenoptera (DAPWH)

This paper introduces the Descriptor Dataset (DAPWH), a curated collection of 3,556 high-resolution images of parasitoid wasps and associated Hymenoptera, including 1,739 COCO-annotated images, to facilitate the development of automated identification systems for these taxonomically challenging groups.

Joao Manoel Herrera Pinheiro, Gabriela Do Nascimento Herrera, Luciana Bueno Dos Reis Fernandes, Alvaro Doria Dos Santos, Ricardo V. Godoy, Eduardo A. B. Almeida, Helena Carolina Onody, Marcelo Andrade Da Costa Vieira, Angelica Maria Penteado-Dias, Marcelo Becker

Published 2026-02-24
📖 4 min read☕ Coffee break read

Imagine you are trying to teach a robot to recognize different types of bees and wasps. You might think, "Easy! Just show it pictures of bees." But here's the problem: the world of wasps is like a massive, crowded library where most of the books have no titles, and the few that do have titles are written in a language only a handful of experts can read.

This paper introduces a new, massive digital library called DAPWH (Dataset of Parasitoid Wasps and Associated Hymenoptera) designed to solve this problem. Here is a simple breakdown of what the researchers did and why it matters.

1. The Problem: The "Needle in a Haystack" of Nature

There are millions of insect species, but the ones that help farmers by eating pests (called parasitoid wasps) are incredibly hard to identify.

  • The Challenge: They look almost identical to the naked eye. It's like trying to tell apart 50 different models of the same car just by looking at a blurry photo.
  • The Bottleneck: To identify them, you usually need a human expert with a microscope and a lot of time. But there aren't enough experts, and they are getting older. We need a faster way.

2. The Solution: A "Training Gym" for AI

The researchers built a massive dataset of 3,556 high-resolution photos of these wasps. Think of this dataset as a gym where a computer program (Artificial Intelligence) goes to exercise and get strong.

  • The Workout: The dataset includes photos of the main "suspects" (Ichneumonidae and Braconidae wasps) but also includes photos of other wasps and bees (like honeybees and yellow jackets) to teach the AI what not to confuse them with.
  • The Views: Just like a car needs to be seen from the front, side, and back to be fully understood, these wasps were photographed from three angles: Lateral (side), Frontal (face), and Dorsal (top). This gives the AI a 3D understanding of the insect.

3. The Secret Sauce: "Highlighting" the Details

The most special part of this dataset is a subset of 1,739 images that have been "highlighted" by humans.

  • The Highlighter: Using a tool called CVAT, experts drew precise boxes around the wasp's body, its wings, and even the tiny ruler (scale bar) next to it.
  • Why it matters: This is like giving a student a textbook where the teacher has already underlined the most important words. Instead of the AI guessing where the wasp is, it learns exactly where the wings end and the body begins. This helps the AI learn to spot tiny details, like the veins in a wing, which are crucial for identification.

4. The Test Drive: Can the Robot Do It?

The researchers didn't just build the dataset; they tested it to see if it actually works. They taught several different AI models (like YOLO and EfficientNet) using this data.

  • The Results: The AI models got really good at the job. Some achieved over 92% accuracy in telling the families of wasps apart.
  • The Weakness: The AI sometimes got confused between two very similar-looking families (Andrenidae and Colletidae). Why? Because there were very few examples of those specific wasps in the "gym." It's like trying to learn to recognize a rare bird when you've only seen three photos of it.

5. Why Should You Care?

This isn't just about wasps; it's about biodiversity and food security.

  • The Guardians: These wasps are nature's pest control. They eat the bugs that destroy our crops.
  • The Future: By giving computers a good "textbook" (this dataset), we can build tools that automatically count and identify these wasps in the wild. This helps scientists monitor ecosystems and helps farmers know if their natural pest control is working, without needing to hire a team of experts to look through microscopes for days.

In a Nutshell

The authors took a chaotic, hard-to-understand group of insects, organized them into a beautiful, high-quality photo album with detailed labels, and used it to teach computers how to recognize them. It's a digital toolkit that turns a super-hard biology problem into a solvable math problem, helping us protect our planet's delicate balance.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →