APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots

The paper presents APEX, a deep reinforcement learning framework that enables a 29-DoF Unitree G1 humanoid robot to autonomously traverse platforms up to 114% of its leg length by composing perceptive climbing, walking, and reconfiguration skills through a novel ratchet progress reward and robust sim-to-real perception strategies.

Yikai Wang, Tingxuan Leng, Changyi Lin, Shiqi Liu, Shir Simon, Bingqing Chen, Jonathan Francis, Ding Zhao

Published 2026-03-09
📖 5 min read🧠 Deep dive

Imagine a humanoid robot as a clumsy toddler learning to walk. For a long time, these robots were great at walking on flat ground or stepping over small puddles. But if you put a high table in front of them (one taller than their legs), they would usually try to jump onto it.

The problem with jumping is that it's like a toddler trying to hop onto a kitchen counter: it requires a huge burst of energy, often results in a hard crash, and if the robot misses, it could break its joints or fall over. It's dangerous and inefficient.

APEX is a new system that teaches the robot to stop jumping and start climbing, just like a human would. Here is how it works, broken down into simple concepts:

1. The "Climbing" Mindset

Instead of treating the robot like a machine that only uses its feet, APEX teaches it to use its whole body.

  • The Analogy: Think of a rock climber scaling a wall. They don't just jump; they use their hands, knees, and torso to find holds, shift their weight, and pull themselves up.
  • The Robot's Skills: The robot learns six specific "moves":
    • Climb-up: Using hands and feet to pull itself onto a high platform.
    • Climb-down: Carefully lowering itself back down.
    • Stand-up & Lie-down: Changing its posture (from standing to lying on its stomach) to fit through tight spaces or reposition itself.
    • Walk & Crawl: Moving around once it's on the platform.

2. The "Ratchet" Reward (The Secret Sauce)

This is the most clever part of the paper. Usually, when you teach a robot, you give it a reward when it finishes a task. But for climbing, waiting until the very end to say "Good job!" is too slow. The robot might get stuck halfway and never know what to do next.

The authors invented a "Ratchet Progress Reward."

  • The Analogy: Imagine a ratchet wrench (the tool mechanics use). It only turns forward; it can't slip backward.
  • How it works: The robot keeps a mental note of its "best progress so far."
    • If the robot moves forward (even a tiny bit) or gets its hand closer to the edge, it gets a tiny reward.
    • If it moves backward or stays in the same spot, it gets a penalty.
    • Crucially: It doesn't care how fast the robot moves. It only cares that it is making genuine progress.
  • Why this matters: This stops the robot from "cheating" by shaking back and forth or rushing into a dangerous jump. It forces the robot to be patient, find a stable handhold, and slowly pull itself up, just like a careful climber.

3. The "Teacher and Student" System

Learning all these complex moves at once is too hard for one brain. So, the researchers used a two-step process:

  • Step 1: The Teachers. They trained six separate "expert" robots (Teachers). One was an expert at climbing up, another at climbing down, another at standing up, etc. They learned these skills in a virtual world (simulation) where they could fail thousands of times without breaking anything.
  • Step 2: The Student. They created one "Student" robot and taught it to copy all six teachers.
    • The Analogy: Imagine a student taking notes from six different professors (one for math, one for history, one for art). The student learns to look at the situation (the terrain) and decide: "Oh, I'm at a high ledge? I'll use the Climbing Professor's notes. I'm on the ground? I'll use the Walking Professor's notes."
    • The Student learns to switch between these skills smoothly without falling over.

4. Seeing the World (The Eyes)

Robots often struggle to see the real world because their cameras get confused by shadows, dust, or their own limbs blocking the view.

  • The Analogy: Imagine trying to walk through a foggy room while wearing glasses that have smudges on them.
  • The Fix: The researchers taught the robot to expect "smudges" (noise and errors) while it was training. They also added a "clean-up" filter for the real world. This means when the robot sees a weird blob of data that looks like a wall but isn't, it knows to ignore it.

The Result: A Real-World Breakthrough

The team tested this on a Unitree G1, a real humanoid robot with 29 moving joints.

  • The Challenge: They placed a platform 0.8 meters high (about 31 inches). For this robot, that is 114% of its leg length. It's like a human trying to climb onto a table that is taller than their own legs.
  • The Outcome: The robot didn't jump. It didn't fall. It walked up to the edge, used its hands to pull itself up, stood up on the platform, walked across, lay down, stood up again, and climbed back down.
  • Zero-Shot Transfer: The most impressive part? They trained the robot in a computer simulation, and when they put it in the real world, it worked immediately without any extra tuning. It was like the robot woke up in the real world and just knew how to do it.

Summary

APEX is like teaching a robot to be a careful, patient rock climber instead of a reckless jumper. By using a special "progress tracker" (the ratchet) and a "teacher-student" learning method, they created a robot that can safely navigate high, difficult terrain that was previously impossible for machines to handle. It's a giant leap (pun intended) toward robots that can actually help us in messy, real-world environments.