DeReCo: Decoupling Representation and Coordination Learning for Object-Adaptive Decentralized Multi-Robot Cooperative Transport

Imagine you have two robots, let's call them Robo-A and Robo-B. Their job is to work together to pick up a heavy object (like a box, a cylinder, or a weirdly shaped sculpture) and carry it to a specific spot.

Here's the catch: They can't talk to each other about what they see. They can only see what's right in front of their own "eyes" (their local sensors). They don't know the other robot's plan, and they don't have a manual telling them the object's weight or how slippery it is.

This is the problem the paper DeReCo tries to solve.

The Problem: The "Bad Date" of Learning

In the past, scientists tried to teach robots to do this by throwing them into a simulation where they had to learn two things at the same time, instantly:

What is this object? (Is it heavy? Is it slippery? What shape is it?)
How do we move together? (When do I push? When do you pull?)

The paper argues that trying to learn these two things simultaneously is like trying to learn to drive a car while simultaneously learning the rules of the road and how to fix the engine. It's too much!

If the robot guesses the object's weight wrong, it messes up the teamwork.
If the teamwork is shaky, the robot gets confused about the object's weight.
The result? The robots learn very slowly, or they never learn at all.

The Solution: DeReCo (The "Three-Act Play")

The authors propose a new method called DeReCo. Instead of juggling everything at once, they break the learning process into three distinct stages, like a play with three acts.

Act 1: The "Cheating" Practice (Centralized Training)

Imagine the robots are in a practice gym with a super-coach standing right next to them.

The coach whispers the secrets: "That box is 5kg," "That cylinder is slippery."
Because the robots have this "privileged information," they can focus 100% on learning how to coordinate perfectly. They learn exactly when to push and pull without worrying about guessing the object's properties.
Analogy: It's like learning to dance with a partner while the instructor holds your hand and tells you every step.

Act 2: The "Sherlock Holmes" Phase (Representation Learning)

Now, the coach leaves the room. The robots are alone, but they still have the "cheat sheet" from Act 1 stored in their memory.

The robots are shown a pile of local data (what their cameras see, what their grippers feel).
They are asked: "Based only on what you see, can you guess the weight and shape of the object?"
They build a special "decoder" (an AI brain) that learns to translate "local sensory clues" into "object secrets."
Analogy: It's like a detective learning to identify a suspect just by their shoe prints and gait, without ever seeing their face. They practice this matching game until they are experts.

Act 3: The "Real World" Performance (Decentralized Execution)

Now, the robots go on stage for the real show.

The coach is gone. The cheat sheet is gone.
The robots use their "Sherlock Holmes" decoder (from Act 2) to guess the object's properties just by looking at it.
They use the coordination skills they learned in Act 1 to move together.
Analogy: The dancers are now performing a complex routine on a dark stage. They can't see the audience or the other dancer clearly, but they know the steps (Act 1) and they can guess the music's tempo by feeling the vibrations (Act 2).

Why is this better?

By separating the learning into these three steps, the robots don't get confused.

They learn coordination when they have all the answers (so they get it right).
They learn observation separately (so they get better at guessing).
They don't interfere with each other.

The Results

The researchers tested this with two real robots (called HSRs) and many different objects:

In Simulation: They trained on three shapes but tested on nine shapes (including ones they had never seen before). DeReCo was much better at carrying the "unseen" objects than the old methods.
In Real Life: They put the robots in a real room with two new objects (a board and a frame). The old method failed to carry them to the goal, but DeReCo succeeded, carrying the objects right to the target spot.

The Big Takeaway

DeReCo is like a smart training program that teaches robots to first learn how to work together when they have all the answers, and then teaches them how to guess the answers on their own. This makes them much faster to train and much better at handling new, weird, or unknown objects in the real world.

Here is a detailed technical summary of the paper "DeReCo: Decoupling Representation and Coordination Learning for Object-Adaptive Decentralized Multi-Robot Cooperative Transport."

1. Problem Statement

The paper addresses the challenge of decentralized multi-robot cooperative transport where robots must transport objects of diverse shapes and physical properties (mass, friction) to a goal using only local observations.

Key Challenges:

Partial Observability: During execution, robots cannot observe global object properties (shape, mass, friction) or the states of other robots. They must infer these "object-dependent representations" from limited sensory inputs.
Non-Stationarity in MARL: In Multi-Agent Reinforcement Learning (MARL), the environment is non-stationary because other agents' policies are constantly evolving during training.
Structural Coupling: Existing approaches typically optimize object representation learning and coordination policies jointly in an end-to-end manner. This creates bidirectional interference:
- Inaccurate representations (due to partial observability) destabilize coordination learning.
- The non-stationarity of MARL degrades the learning of object representations.
- This coupling leads to sample-inefficient and unstable training, hindering generalization to unseen objects.

2. Methodology: DeReCo Framework

The authors propose DeReCo, a novel MARL framework that decouples representation learning from coordination learning to mitigate interference. It employs a three-stage training strategy:

Stage 1: Centralized Coordination Learning with Privileged Information

Goal: Establish stable coordination policies without the noise of representation learning.
Mechanism: Uses Centralized Training with Decentralized Execution (CTDE).
- The Critic and Actor have access to Privileged Information (PI), including the object's shape, mass, and friction.
- This allows the agents to learn optimal coordination policies based on perfect object knowledge, stabilizing the learning process against MARL non-stationarity.

Stage 2: Adaptive Encoder Learning (Supervised)

Goal: Learn to reconstruct object-dependent representations from local observations alone.
Mechanism:
- Data Collection: The policy from Stage 1 is rolled out to generate a dataset of pairs: $(local\_observation, target\_representation)$ . The target representation is the latent object state computed in Stage 1.
- Training: An Adaptive Encoder (implemented as an LSTM) is trained via supervised learning (Minimizing Mean Squared Error) to predict the target representation using only local observations.
- Decoupling: This stage is independent of the policy optimization, preventing the non-stationarity of MARL from degrading the representation learning.

Stage 3: MARL with Adaptive Encoder

Goal: Transition to fully decentralized execution.
Mechanism:
- The Actor and Critic are re-initialized with weights from Stage 1.
- The Adaptive Encoder (frozen from Stage 2) is integrated into the Actor to reconstruct object representations from local inputs.
- Privileged Information is progressively removed from the Actor's input. The Critic still uses PI during training to maintain stability, but the Actor learns to act based solely on the reconstructed representations.
- This enables the final policy to execute in a decentralized manner without access to object properties.

3. Key Contributions

DeReCo Framework: A novel MARL approach that explicitly decouples representation learning from coordination learning to solve the structural coupling problem in object-adaptive transport.
Three-Stage Training Strategy: A systematic pipeline (Centralized Coordination $\to$ Supervised Representation Reconstruction $\to$ Decentralized Fine-tuning) that ensures stable and sample-efficient training.
Superior Generalization: Demonstrated ability to generalize to unseen object shapes and physical properties (mass/friction) in both simulation and real-world hardware, outperforming end-to-end baselines.

4. Experimental Results

The method was evaluated using two Human Support Robots (HSRs) in simulation (Isaac Sim) and real-world hardware experiments.

Experimental Setup:

Training Objects: 3 shapes (Bar, Cylinder, Board) with randomized mass (0.2–1.0 kg) and friction (0.5–1.0).
Test Objects: 6 unseen shapes (Hexagon, Triangle, L-bar, Thick bar, Octagon, Semi-ellipse) + 2 unseen objects in real-world tests.
Baselines: Compared against standard MAPPO variants (with/without PI, with/without LSTM, with/without Adaptive Encoder).

Key Findings:

Training Performance (RQ1): DeReCo achieved higher tracking rewards and faster convergence compared to end-to-end baselines (e.g., MAPPO w/o AE). It proved that decoupling representation and coordination mitigates training instability.
Simulation Generalization (RQ2):
- DeReCo achieved an average success rate of 80% on unseen objects.
- It significantly outperformed MAPPO w/o PI (58%) and MAPPO w/o AE (68%).
- Notably, DeReCo surpassed MAPPO w/ PI (45%) on unseen objects. This highlights that relying on specific object IDs (PI) during training causes a mismatch when IDs are randomized for unseen objects, whereas DeReCo's learned representations generalize better.
- Failure Analysis: The primary failure mode for all methods was "transport failure" (failing to reach the goal), but DeReCo minimized grasp-and-lift failures compared to baselines.
Real-World Transfer (RQ3):
- In hardware experiments with two unseen objects (a Board and a Frame), DeReCo achieved a 100% success rate (5/5 trials) for the Board and 80% (4/5) for the Frame.
- The strongest baseline (MAPPO w/o AE) failed to transport the objects successfully (0/5), often causing the object to tip over or fail to reach the goal.
- DeReCo successfully transferred from simulation to reality (Sim2Real) without fine-tuning.

5. Significance and Future Work

Significance: DeReCo solves a fundamental bottleneck in decentralized multi-robot systems: the trade-off between learning complex object dynamics and coordinating agents. By separating these tasks, it enables robust transport of diverse, unseen objects without requiring explicit dynamics models or global state information.
Limitations & Future Work:
- The current approach relies on a fixed number of robots (two HSRs); scaling to variable team sizes requires further adaptation to decentralized MARL formulations that are independent of agent count.
- Extending the diversity of object shapes and physical property ranges could increase training costs, suggesting a need for more efficient domain randomization strategies.

In conclusion, DeReCo provides a robust, sample-efficient solution for decentralized cooperative transport, demonstrating that decoupling representation learning from coordination learning is critical for generalizing to diverse physical environments.