SPMamba-YOLO: An Underwater Object Detection Network Based on Multi-Scale Feature Enhancement and Global Context Modeling

This paper proposes SPMamba-YOLO, a novel underwater object detection network that integrates a Spatial Pyramid Pooling Enhanced Layer Aggregation Network (SPPELAN), a Pyramid Split Attention (PSA) mechanism, and a Mamba-based state space modeling module to effectively address challenges like light attenuation and small targets, achieving a 4.9% mAP@0.5 improvement over YOLOv8n on the URPC2022 dataset.

Guanghao Liao, Zhen Liu, Liyuan Cao, Yonghui Yang, Qi Li

Published 2026-02-27
📖 4 min read☕ Coffee break read

Imagine trying to find a tiny, colorful seashell on the ocean floor while wearing a pair of foggy, blue-tinted goggles. The water is murky, the light is dim, and the sand is full of other rocks that look just like your target. This is the daily reality for underwater robots trying to spot sea creatures like sea cucumbers, starfish, and scallops.

The paper you shared introduces a new "brain" for these robots called SPMamba-YOLO. Think of it as upgrading a robot's vision system from a standard pair of glasses to a super-powered, smart-augmented reality headset.

Here is how this new system works, broken down into simple concepts:

1. The Problem: The "Murky Water" Effect

Underwater, cameras struggle because:

  • Colors get washed out: Red disappears first, making everything look blue or green.
  • Things look blurry: Particles in the water scatter light, creating a "fog."
  • Targets are tiny: A small starfish might look like a speck of dust on a giant screen.

Old detection systems often get confused, missing small targets or mistaking a rock for a starfish.

2. The Solution: The Three Super-Powers

The researchers built a new network that combines three specific "super-powers" to fix these problems.

Power #1: The "Zoom Lens" (SPPELAN Module)

  • The Analogy: Imagine trying to find a specific person in a crowd. If you only look at them from far away, they look like a dot. If you only look at them from inches away, you can't see who they are standing next to. You need to see them at many distances at once.
  • How it works: This module acts like a camera that instantly takes photos at different zoom levels and stacks them together. It helps the robot see both the tiny details of a small scallop and the big picture of the surrounding seabed, ensuring nothing gets missed because it's too small or too far away.

Power #2: The "Spotlight" (PSA Attention Mechanism)

  • The Analogy: Imagine you are at a noisy party. You want to hear one friend talking, but everyone else is shouting. A normal person tries to listen to everyone. This module is like a magical spotlight that instantly silences the background noise and shines a bright beam only on your friend.
  • How it works: Underwater, there is a lot of "visual noise" (sand, bubbles, dark rocks). This mechanism tells the robot's brain, "Ignore the boring background sand; focus only on the weird shapes that look like animals." It filters out the clutter so the robot can focus on what matters.

Power #3: The "Long-Range Memory" (Mamba Module)

  • The Analogy: Imagine you are walking through a forest. If you only look at the tree right in front of your nose, you might trip. But if you have a memory of the whole path you've walked and can see the path stretching far ahead, you can predict where the trail goes.
  • How it works: Traditional AI often looks at an image in tiny, isolated chunks. The Mamba module is special because it can "look" at the whole image and understand how one part connects to another, even if they are far apart. It understands the context. For example, it knows that if it sees a sea urchin, there's a good chance a sea cucumber is nearby, even if the sea cucumber is blurry. It connects the dots across the entire scene.

3. The Result: A Smarter Robot

When the researchers tested this new system (SPMamba-YOLO) on a dataset of underwater images (URPC2022), the results were impressive:

  • It found more things: It caught about 5% more targets than the previous best standard (YOLOv8).
  • It handled the hard stuff: It was much better at finding tiny, crowded, or blurry objects.
  • It wasn't too slow: Even with all these fancy new features, it still ran fast enough to be used in real-time by a robot.

The Bottom Line

Think of SPMamba-YOLO as giving an underwater robot a pair of glasses that can zoom in and out instantly, a spotlight to cut through the fog, and a super-memory to understand the whole scene.

Instead of just "seeing" pixels, the robot now "understands" the underwater world, making it much better at finding sea life for research, pipeline checks, or exploring the ocean floor.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →