When Semantics Connect the Swarm: LLM-Driven Fuzzy Control for Cooperative Multi-Robot Underwater Coverage

This paper proposes a semantics-guided fuzzy control framework that leverages Large Language Models to compress multimodal observations into interpretable tokens for robust, GPS-denied underwater navigation and semantic communication-based coordination among multi-robot swarms.

Jingzehua Xu, Weihang Zhang, Yangyang Li, Hongmiaoyi Zhang, Guanwen Xie, Jiwei Tang, Shuai Zhang, Yi Li

Published Fri, 13 Ma
📖 3 min read☕ Coffee break read

Imagine you are leading a team of blindfolded divers trying to map a mysterious, dark underwater coral reef. They can't see far, they can't talk clearly (water distorts sound), and they have no GPS to tell them where they are. If they try to swim in a straight line, they'll crash into rocks or get lost.

This paper proposes a brilliant new way to organize these divers using a mix of AI brains and simple, instinctive rules. Here is how it works, broken down into three simple steps:

1. The "Translator" Brain (The LLM)

Normally, robots see the world as a chaotic mess of raw data: "pixel at X,Y is blue," "sonar ping at Z is loud." This is like trying to read a book written in a language you don't speak.

In this new system, a Large Language Model (LLM) acts as a super-smart translator sitting in the robot's head. Instead of processing millions of confusing data points, the LLM looks at the chaos and says, "Okay, I see a big rock wall to the left, a shiny treasure chest ahead, and a dark, empty cave to the right."

It turns the messy data into simple, human-like words (semantic tokens). It's like the robot stops thinking in "math" and starts thinking in "stories."

2. The "Instinctive" Pilot (Fuzzy Control)

Once the robot has a simple story ("Rock on left, treasure ahead"), it needs to decide how to move. Usually, this requires complex math and a perfect map, which they don't have.

Instead, this system uses Fuzzy Logic, which is like human intuition. Think of it as a set of simple rules a human diver would use:

  • "If the rock is very close, turn sharply."
  • "If the treasure is somewhat visible, swim slowly."

Because these rules are "fuzzy" (they handle uncertainty well), the robot can make smooth, safe turns without needing to know its exact GPS coordinates. It just reacts to the story the LLM told it.

3. The "Whisper Network" (Semantic Communication)

Now, imagine you have a whole team of these divers. If they all swim toward the same treasure, they waste time. If they all swim away, the treasure is missed.

Usually, robots try to share complex maps or coordinates, which is hard underwater. This paper suggests they share intent instead.

  • Robot A doesn't send a map; it whispers, "I'm heading toward the big rock."
  • Robot B hears this and thinks, "Oh, he's taking the rocks. I'll go check the cave."

They coordinate by sharing linguistic ideas (like "I'm exploring the dark zone") rather than heavy data files. This keeps them from bumping into each other or checking the same spot twice, even if the connection is spotty.

The Big Picture

The researchers tested this in a computer simulation of a messy, unknown reef. The result? The team of robots worked together like a well-oiled machine. They found the "treasure" (Objects of Interest) faster and covered more ground without getting lost or crashing.

In short: They taught underwater robots to stop trying to be perfect calculators and start acting like intuitive, talking divers who use common sense and simple language to navigate the unknown.