SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning

This paper introduces ReST-RL, a hierarchical reinforcement learning framework that decouples humanoid locomotion from payload stabilization via a residual module, achieving robust, zero-shot sim-to-real object balancing on the Unitree G1 without compromising gait stability.

Anlun Huang, Zhenyu Wu, Soofiyan Atar, Yuheng Zhi, Michael Yip

Published Thu, 12 Ma
📖 5 min read🧠 Deep dive

Imagine you are walking through a crowded, bumpy street while carrying a tray with a glass of wine and a plate of spaghetti. Your goal is to get to the other side without spilling a drop or dropping the plate.

Now, imagine that you are a robot. Not just any robot, but a two-legged humanoid robot that has to walk, turn, and balance all at once. Every time it takes a step, its body naturally jiggles and sways. If it tries to hold the tray perfectly still while walking, it might trip. If it focuses only on walking, the wine spills.

This is the problem the paper "SteadyTray" solves. Here is how they did it, explained simply:

1. The Problem: The "Jiggly" Walk

Humanoid robots are great at walking, but walking creates vibrations. Think of it like a car driving over a gravel road; the whole car shakes. If you put a cup of coffee on the dashboard of that car, it will spill.

Previous robots tried to solve this by having one giant "brain" try to figure out how to walk and keep the tray steady at the same time. It was like asking a student to do advanced calculus while simultaneously juggling three balls. They often failed, especially when someone bumped into the robot or the robot had to turn quickly.

2. The Solution: The "Coach and the Player" (ReST-RL)

The researchers came up with a clever two-step system called ReST-RL. Instead of one giant brain, they used a "Teacher" and a "Student" approach with a special twist.

  • The Base Policy (The Experienced Walker): First, they trained a robot to be a great walker. This robot knows how to walk, turn, and stay upright on two legs. It's like a professional dancer who knows the steps perfectly. This part is "frozen," meaning we don't change how it walks.
  • The Residual Module (The Stabilizing Coach): Then, they added a second, smaller "brain" on top. This brain doesn't tell the robot how to walk. Instead, it acts like a coach watching the dancer.
    • The coach sees the wine glass wobbling.
    • The coach whispers tiny, quick corrections to the dancer's arms: "Tilt left a little," "Move your hand forward," "Calm down."
    • The dancer (the base walker) keeps doing its dance steps, but the coach's tiny nudges cancel out the wobble.

This separation is key. The robot doesn't have to relearn how to walk; it just learns how to adjust its walk to keep the tray steady.

3. The Secret Sauce: "Training with a Delay"

One of the smartest tricks in the paper is how they trained the robot. In the real world, cameras and sensors are slow. By the time the robot "sees" the wine glass tipping, a fraction of a second has already passed.

To prepare for this, the researchers intentionally slowed down the robot's vision during training. They made the robot practice while looking at "old" data.

  • The Analogy: Imagine learning to ride a bike while wearing glasses that show you where you were 0.5 seconds ago. It's hard at first, but once you get used to it, you become incredibly good at predicting where you will be.
  • The Result: When they took the glasses off (deployed the robot in the real world), the robot was so good at predicting the future that it could stabilize the tray even when the sensors were slow or when someone pushed it.

4. The Results: The "Unitree G1" Test

They tested this on a real robot called the Unitree G1.

  • The Test: They made the robot walk while carrying a tray with a wine glass full of liquid, a coffee cup, and even medical tools.
  • The Chaos: They kicked the robot, pushed the tray, and made it walk fast and slow.
  • The Outcome: The robot kept the tray level. The wine didn't spill. The tools didn't fall. It worked so well that it could handle these tasks without needing to be retrained for every new object.

Why This Matters

This isn't just about robots carrying drinks. It's about making robots useful in our messy, human world.

  • Future Jobs: Imagine a robot waiter in a busy restaurant that never spills a drink, even if a customer bumps into it.
  • Hospitals: Imagine a robot carrying sterile instruments through a crowded hallway without shaking them.
  • Elder Care: Imagine a robot bringing a tray of medicine to an elderly person, navigating around furniture and people without dropping anything.

In short: The paper teaches a robot to be a "dancer" that knows how to walk, and a "magician" that knows how to keep a tray perfectly still, all by using a smart, layered learning system that prepares for the real world's delays and bumps.