Preference-Conditioned Multi-Objective RL for Integrated Command Tracking and Force Compliance in Humanoid Locomotion

This paper proposes a preference-conditioned multi-objective reinforcement learning framework that enables a single humanoid locomotion policy to dynamically balance accurate command tracking with compliant responses to external forces, validated through stable training and successful deployment in both simulation and real-world experiments.

Tingxuan Leng, Yushi Wang, Tinglong Zheng, Changsheng Luo, Mingguo Zhao

Published 2026-03-10
📖 5 min read🧠 Deep dive

Imagine a humanoid robot as a very strong, very fast runner who has been trained to ignore everything around them. If you try to push them, they stiffen up like a brick wall to stay on their path. While this makes them great at running in a straight line, it makes them terrible at interacting with humans. If a person tries to gently guide them, the robot fights back, potentially knocking the person over or acting dangerously rigid.

This paper introduces a new way to train these robots so they can be both a determined runner and a gentle dance partner, depending on what you need at that moment.

Here is the breakdown of their solution, using simple analogies:

1. The Problem: The "Stubborn vs. Wobbly" Dilemma

Traditionally, robot trainers use a method where they constantly push the robot during training to make it "tough."

  • The Result: The robot becomes a "stubborn mule." It follows its instructions perfectly and resists any push.
  • The Flaw: If a human tries to guide the robot (like holding its hand to walk around a corner), the robot fights back. It's too stiff to be safe or natural in a human environment.

2. The Solution: The "Volume Knob" for Behavior

The authors created a system where you don't need to train two different robots. Instead, you train one robot that has a "Volume Knob" (called a Preference Input).

  • Turn the knob to "Command": The robot acts like a race car. It ignores your gentle pushes and focuses entirely on following its GPS (velocity commands).
  • Turn the knob to "Compliance": The robot acts like a soft, yielding dance partner. If you push it, it goes with the flow. It lets you guide it easily.
  • Turn the knob to "Middle": The robot finds a balance. It tries to follow its path but will gently yield if you push hard enough.

You can slide this knob anywhere in between, and the robot instantly changes its personality without needing to be retrained.

3. How They Taught It: The "Translator" Trick

Here is the tricky part: Robots usually can't "feel" a human pushing them unless they have expensive, delicate sensors on their skin. But the robot needs to know it's being pushed to comply.

The researchers used a clever Teacher-Student trick:

  • The Teacher (in Simulation): The robot is trained in a video game world where the computer knows exactly how hard the human is pushing. The teacher learns the secret connection between "being pushed" and "how the robot's body moves."
  • The Student (on the Real Robot): The real robot only has cameras and internal sensors (like knowing its own joint angles). It doesn't know the push force directly.
  • The Magic: The system forces the "Student" to guess what the "Teacher" knows. By looking at how its body is moving, the robot learns to infer (guess) that it is being pushed, even without a force sensor. It's like learning to feel a breeze just by watching the leaves move, rather than feeling the wind on your face.

4. The "Resistance" Analogy

To make the math work, the researchers treated "pushing force" and "walking speed" as if they were the same thing.

  • Imagine walking through water. If you push against the water, you slow down.
  • They taught the robot: "If someone pushes you, it's like you are suddenly walking through thick mud. You should slow down or move in the direction of the push, just like you would if you were trying to walk through water."
  • This simple rule allowed the robot to understand that being pushed and being told to stop are mathematically similar problems.

5. The Results: From Lab to Real Life

They tested this on a real, adult-sized robot named Booster T1.

  • The Test: Humans pulled the robot by the hand, shoulder, and neck.
  • The Old Robot: Fought back, required huge strength to move, and was jerky.
  • The New Robot: Moved smoothly with very little effort (about 10 Newtons of force, roughly the weight of a liter of water). It could walk across rough grass, soccer fields, and uneven ground while being gently guided by a human.
  • The Safety Check: They even hit the robot with a heavy ball (up to 5kg). The robot didn't fall; it just took a step back and absorbed the blow, showing it was still tough enough to handle accidents, even while being "soft."

The Bottom Line

This paper solves the "Brick Wall vs. Jellyfish" problem. It gives us a single robot brain that can be tough as a rock when it needs to navigate a crowd, but soft as jelly when a human needs to guide it. It's a major step toward robots that can safely work and play alongside us in our homes and cities.