Sim-to-reality adaptation for Deep Reinforcement Learning applied to an underwater docking application

This paper presents a systematic sim-to-reality adaptation framework using a high-fidelity Stonefish digital twin and PPO-based Deep Reinforcement Learning to achieve over 90% successful autonomous docking for the Girona AUV, validated by physical experiments demonstrating emergent control behaviors.

Alaaeddine Chaarani, Narcis Palomeras, Pere Ridao

Published 2026-03-13
📖 4 min read☕ Coffee break read

Imagine you are teaching a toddler how to park a car in a very tight, underwater garage. But there's a catch: the ocean is dark, the water pushes the car around unpredictably, and if you bump the wall too hard, the car breaks.

Teaching a robot to do this using old-school programming is like giving the toddler a rigid set of instructions: "Turn left 30 degrees, then stop." If the water pushes the car even a tiny bit, the instructions fail, and the robot crashes.

This paper is about a smarter way to teach the robot: Deep Reinforcement Learning (DRL). Think of this as letting the robot learn by trial and error, just like a human learns to ride a bike.

Here is the story of how the researchers taught the Girona AUV (an underwater robot) to dock, broken down into simple concepts:

1. The "Video Game" Problem (Sim-to-Real)

You can't let a real robot crash 10,000 times in a real ocean to learn how to park; it would be too expensive and dangerous. So, the researchers built a super-realistic video game (a "digital twin") using a simulator called Stonefish.

  • The Analogy: Imagine a flight simulator for pilots. It looks and feels like flying a real plane, but if you crash, you just hit "reset" and try again.
  • The Challenge: Usually, what a robot learns in a video game doesn't work in the real world because the game physics are too perfect. The real ocean has messy currents and noisy sensors.
  • The Fix: The researchers made their "game" messy on purpose. They added fake sensor noise and random water currents so the robot learned to be tough, not just perfect in a sterile environment.

2. The Speed Hack (Multiprocessing)

Learning takes a long time. If the robot tried one move every second, it would take years to learn.

  • The Analogy: Imagine trying to learn a new language by reading one word a day. Now imagine you have 20 friends reading 20 words a day and teaching you simultaneously.
  • The Fix: They ran 20 copies of the simulation at the same time on their computer. Instead of the robot practicing for 3 hours, it practiced the equivalent of 60 hours in just a few hours. This is like fast-forwarding the robot's life to get it "mature" quickly.

3. The Teacher's Scorecard (Reward Function)

How does the robot know if it's doing a good job? It needs a scorecard. The researchers designed a complex scoring system:

  • Distance Points: You get points for getting closer to the dock.
  • Angle Points: You get points for lining up straight.
  • Smoothness Points: If you jerk the controls around wildly, you lose points. The robot learns to move gently, like a cat, rather than a drunk driver.
  • The "Ouch" Penalty: If the robot hits the dock too hard, it gets a big negative score. This teaches it to "brake" before impact.

4. The "Aha!" Moments (Emergent Behaviors)

The most exciting part is what the robot figured out on its own. The researchers didn't program the robot to do these specific tricks; the robot invented them to get a better score.

  • The "Pitch Brake": When approaching the dock, the robot learned to tilt its nose up (pitch) to use water resistance as a brake, slowing itself down smoothly.
  • The "Wiggle Dance": As it got very close, the robot started shaking its tail (yaw oscillation) slightly. This helped it slide perfectly into the narrow docking funnel, correcting tiny misalignments that a standard computer program would miss.

5. The Real-World Test

Finally, they took the robot out of the "video game" and into a real, 19-meter-long water tank.

  • The Result: The robot, which had never seen the real tank before, successfully docked 8 out of 10 times.
  • Why it matters: It proved that the "video game" training was so good that the robot didn't get confused when the real water pushed it or when the sensors were a bit fuzzy. It transferred its "muscle memory" from the computer to the real world.

The Big Picture

This paper shows that we don't need to write complex code to tell a robot exactly how to move. Instead, we can build a realistic, slightly chaotic "training camp" in a computer, let the robot play thousands of games to figure out the best moves, and then send it out to do the real job.

It's like training a dog not by commanding "Sit, Stay, Roll Over," but by playing fetch in a park until the dog figures out the best way to catch the ball, even if the wind blows it off course. The result? A robot that is adaptable, robust, and surprisingly clever.