RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning

RL-100 is a unified real-world reinforcement learning framework that combines diffusion visuomotor policies with a clipped PPO objective and consistency distillation to achieve 100% success across 1,000 diverse robotic manipulation trials, matching or surpassing human experts while demonstrating robust zero-shot generalization and continuous deployment in dynamic environments.

Kun Lei, Huanyu Li, Dongjie Yu, Zhenyu Wei, Lingxiao Guo, Zhennan Jiang, Ziyu Wang, Shiyu Liang, Huazhe Xu

Published Wed, 11 Ma
📖 4 min read☕ Coffee break read

Imagine you are teaching a robot to do chores. Traditionally, you have two main ways to teach it:

  1. The "Copycat" Method (Imitation Learning): You hold the robot's hand and show it exactly how to fold a shirt or pour a glass of water. The robot tries to copy your every move.
    • The Problem: The robot can only be as good as you are. If you are a bit clumsy, the robot will be clumsy. If you are slow, the robot is slow. It can't figure out a better way to do things on its own.
  2. The "Trial and Error" Method (Reinforcement Learning): You let the robot try things on its own. It drops the cup, spills the water, and breaks things until it finally learns the right way.
    • The Problem: This is dangerous, expensive, and takes forever. You can't let a robot break a million cups just to learn how to pour one.

Enter RL-100: The "Smart Intern" Approach

The paper introduces RL-100, a new system that combines the best of both worlds. Think of it as hiring a brilliant but inexperienced intern.

Phase 1: The Internship (Imitation Learning)

First, you give the robot a stack of videos showing a human expert doing the task perfectly. The robot studies these videos and learns the basics.

  • The Analogy: This is like a medical student reading textbooks and watching surgeons. They know the theory and the "safe" way to do things, but they haven't actually performed the surgery yet. They are safe, but maybe a bit stiff and slow.

Phase 2: The Practice Rounds (Offline RL)

Now, instead of letting the robot break real things, the system simulates thousands of "what-if" scenarios in a digital sandbox. The robot tries to improve on the human's moves. It asks, "If I move my hand 2 millimeters faster, will I finish sooner?" or "If I tilt the cup differently, will I spill less?"

  • The Analogy: This is like the medical student practicing on a virtual reality simulator. They can make mistakes, try risky moves, and learn from them without hurting a single patient. They are refining the human's technique to be faster and more efficient.

Phase 3: The Final Exam (Online RL)

Once the robot is very good in the simulator, you let it try on the real robot for a short, supervised period. This is just to fix those tiny, rare mistakes that only happen in the real world (like a slippery table or a weirdly shaped object).

  • The Analogy: This is the student's first real surgery, but with a senior doctor standing right next to them, ready to step in if things go wrong. They only need a little bit of real-world practice to become perfect.

The Secret Sauce: "Speeding Up the Brain"

One of the biggest hurdles in robotics is that complex AI models are slow. Imagine a robot that has to think for 10 seconds before it makes a single hand movement. That's too slow for real life.

RL-100 uses a clever trick called Distillation.

  • The Analogy: Imagine a master chef who takes 10 minutes to taste, adjust, and perfect a sauce before serving it. RL-100 teaches a "junior chef" (the Consistency Model) to taste the sauce and serve it perfectly in one second. The junior chef learned by watching the master chef work, but now they can do it instantly. This allows the robot to react in real-time, like a human.

What Did They Achieve?

The team tested this on 8 different difficult tasks, including:

  • Folding a towel (which is floppy and hard to predict).
  • Unscrewing a nut (which requires precise twisting).
  • Pouring water without spilling.
  • Juicing an orange in a busy shopping mall.

The Results:

  • 100% Success Rate: The robot succeeded in every single attempt (1,000 out of 1,000 trials).
  • Human-Level Speed: It was as fast as, or faster than, the human experts who taught it.
  • Real-World Durability: They put the orange juicing robot in a shopping mall. It served random customers for 7 hours straight without failing, even when the oranges were different shapes or the environment was chaotic.
  • Resilience: If a human pushed the robot's arm or changed the table surface, the robot didn't panic; it just adjusted and kept going.

Why This Matters

Before this, robots were either "safe but slow" (copying humans) or "fast but dangerous" (learning from scratch). RL-100 proves we can have robots that are safe, fast, and reliable enough to work in our homes and factories. It's the difference between a robot that needs a human to hold its hand, and a robot that can be trusted to do the job on its own.