DeReCo: Decoupling Representation and Coordination Learning for Object-Adaptive Decentralized Multi-Robot Cooperative Transport

This paper introduces DeReCo, a novel multi-agent reinforcement learning framework that decouples representation and coordination learning through a three-stage training strategy to overcome bidirectional interference, thereby enabling sample-efficient and robust decentralized cooperative transport across objects with diverse shapes and physical properties.

Kazuki Shibata, Ryosuke Sota, Shandil Dhiresh Bosch, Yuki Kadokawa, Tsurumine Yoshihisa, Takamitsu Matsubara

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you have two robots, let's call them Robo-A and Robo-B. Their job is to work together to pick up a heavy object (like a box, a cylinder, or a weirdly shaped sculpture) and carry it to a specific spot.

Here's the catch: They can't talk to each other about what they see. They can only see what's right in front of their own "eyes" (their local sensors). They don't know the other robot's plan, and they don't have a manual telling them the object's weight or how slippery it is.

This is the problem the paper DeReCo tries to solve.

The Problem: The "Bad Date" of Learning

In the past, scientists tried to teach robots to do this by throwing them into a simulation where they had to learn two things at the same time, instantly:

  1. What is this object? (Is it heavy? Is it slippery? What shape is it?)
  2. How do we move together? (When do I push? When do you pull?)

The paper argues that trying to learn these two things simultaneously is like trying to learn to drive a car while simultaneously learning the rules of the road and how to fix the engine. It's too much!

  • If the robot guesses the object's weight wrong, it messes up the teamwork.
  • If the teamwork is shaky, the robot gets confused about the object's weight.
  • The result? The robots learn very slowly, or they never learn at all.

The Solution: DeReCo (The "Three-Act Play")

The authors propose a new method called DeReCo. Instead of juggling everything at once, they break the learning process into three distinct stages, like a play with three acts.

Act 1: The "Cheating" Practice (Centralized Training)

Imagine the robots are in a practice gym with a super-coach standing right next to them.

  • The coach whispers the secrets: "That box is 5kg," "That cylinder is slippery."
  • Because the robots have this "privileged information," they can focus 100% on learning how to coordinate perfectly. They learn exactly when to push and pull without worrying about guessing the object's properties.
  • Analogy: It's like learning to dance with a partner while the instructor holds your hand and tells you every step.

Act 2: The "Sherlock Holmes" Phase (Representation Learning)

Now, the coach leaves the room. The robots are alone, but they still have the "cheat sheet" from Act 1 stored in their memory.

  • The robots are shown a pile of local data (what their cameras see, what their grippers feel).
  • They are asked: "Based only on what you see, can you guess the weight and shape of the object?"
  • They build a special "decoder" (an AI brain) that learns to translate "local sensory clues" into "object secrets."
  • Analogy: It's like a detective learning to identify a suspect just by their shoe prints and gait, without ever seeing their face. They practice this matching game until they are experts.

Act 3: The "Real World" Performance (Decentralized Execution)

Now, the robots go on stage for the real show.

  • The coach is gone. The cheat sheet is gone.
  • The robots use their "Sherlock Holmes" decoder (from Act 2) to guess the object's properties just by looking at it.
  • They use the coordination skills they learned in Act 1 to move together.
  • Analogy: The dancers are now performing a complex routine on a dark stage. They can't see the audience or the other dancer clearly, but they know the steps (Act 1) and they can guess the music's tempo by feeling the vibrations (Act 2).

Why is this better?

By separating the learning into these three steps, the robots don't get confused.

  • They learn coordination when they have all the answers (so they get it right).
  • They learn observation separately (so they get better at guessing).
  • They don't interfere with each other.

The Results

The researchers tested this with two real robots (called HSRs) and many different objects:

  • In Simulation: They trained on three shapes but tested on nine shapes (including ones they had never seen before). DeReCo was much better at carrying the "unseen" objects than the old methods.
  • In Real Life: They put the robots in a real room with two new objects (a board and a frame). The old method failed to carry them to the goal, but DeReCo succeeded, carrying the objects right to the target spot.

The Big Takeaway

DeReCo is like a smart training program that teaches robots to first learn how to work together when they have all the answers, and then teaches them how to guess the answers on their own. This makes them much faster to train and much better at handling new, weird, or unknown objects in the real world.