Adaptive Enhancement and Dual-Pooling Sequential Attention for Lightweight Underwater Object Detection with YOLOv10

This paper proposes a lightweight underwater object detection framework based on YOLOv10 that integrates a Multi-Stage Adaptive Enhancement module, a Dual-Pooling Sequential Attention mechanism, and a Focal Generalized IoU loss to significantly improve accuracy and robustness on benchmark datasets while maintaining a compact model size suitable for resource-constrained environments.

Md. Mushibur Rahman, Umme Fawzia Rahim, Enam Ahmed Taufik

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine you are trying to spot a rare, colorful fish swimming in a murky, deep ocean. The water is cloudy, the light is dim, and everything looks greenish-blue or blurry. Now, imagine you are a robot (an underwater drone) trying to find that fish using a camera. It's incredibly hard because the camera sees a mess of colors and shadows, not a clear picture.

This paper is about teaching that robot a new, super-smart way to see clearly in that messy water, without needing a giant, expensive computer to do the thinking.

Here is the breakdown of their solution, using simple analogies:

1. The Problem: The "Murky Water" Effect

Underwater cameras struggle because water eats away at light and scatters it. It's like trying to read a book through a dirty, foggy window.

  • The Issue: Standard AI models (the "brains" of the robot) were trained on clear, sunny land photos. When they look underwater, they get confused. They can't tell the difference between a fish and a rock because the colors are wrong and the edges are fuzzy.
  • The Consequence: The robot misses the fish or mistakes a bubble for a fish.

2. The Solution: A Three-Step "Super-Vision" Kit

The authors built a new system based on a popular AI model called YOLOv10 (which is like a very fast, efficient detective). They gave this detective three special tools to handle the underwater mess:

Tool A: The "Digital Photo Editor" (Multi-Stage Adaptive Enhancement)

Before the detective even looks at the fish, they first clean up the photo.

  • The Analogy: Imagine you have a photo that looks too blue and dark. You use a filter to add red back in, brighten the shadows, and sharpen the edges.
  • What they did: They created a step-by-step process that automatically fixes the color (removes the blue tint), boosts the contrast (makes dark things lighter), and removes the "haze" (like fog). Crucially, this is a fixed rulebook, not a learning process, so it happens instantly without slowing the robot down.

Tool B: The "Binoculars with a Spotlight" (Dual-Pooling Sequential Attention)

Once the photo is cleaned, the detective needs to know where to look.

  • The Analogy: Imagine you are in a crowded room. Instead of staring at the whole room, you put on binoculars that zoom in on specific spots (spatial attention) and then filter out the noise so you only see the person you are looking for (channel attention).
  • What they did: They added a "Dual-Pooling Sequential Attention" mechanism. It acts like a spotlight that ignores the boring background (sand, bubbles, seaweed) and focuses intensely on the small, important objects (fish, turtles). It helps the robot see tiny details that usually get lost in the noise.

Tool C: The "Strict Coach" (FGIoU Loss)

When the robot makes a guess ("That's a fish!"), it needs to be graded on how good the guess was.

  • The Analogy: Imagine a coach grading a student. If the student says "It's a fish" but draws the box around the fish too loosely, the coach says, "No, that's not precise enough." If the student misses a fish entirely, the coach says, "You missed it!"
  • What they did: They created a new scoring system (a "Loss Function") that punishes the robot if it's not precise with the box around the fish or if it gets confused about whether an object is actually there. It forces the robot to be both accurate and confident.

3. The Result: Fast, Small, and Super Accurate

The best part? They didn't need a supercomputer to run this.

  • The "Lightweight" Magic: Usually, making AI smarter makes it heavier and slower. But this team managed to make the AI smarter and keep it tiny (only 2.8 million "parameters," which is like the size of a small app on your phone).
  • The Score: When they tested it on real underwater datasets (RUOD and DUO), their new system was 6% to 7% more accurate than the standard version.
    • Think of it this way: If the old robot found 82 out of 100 fish, the new robot finds 89 out of 100. That's a huge difference when you are looking for rare species or navigating safely.

Why Does This Matter?

This technology is perfect for Autonomous Underwater Vehicles (AUVs)—robots that explore the ocean without a human pilot.

  • These robots have limited battery and small computers.
  • They can't carry a massive supercomputer.
  • This new method allows them to see clearly, find objects quickly, and make decisions in real-time, even in the darkest, murkiest parts of the ocean.

In a nutshell: The authors took a standard AI detective, gave it a photo editor to clean the view, a spotlight to focus on the target, and a strict coach to improve its accuracy, all while keeping the detective small enough to fit in a backpack.