InterReal: A Unified Physics-Based Imitation Framework for Learning Human-Object Interaction Skills

InterReal is a unified physics-based imitation learning framework that enables humanoid robots to robustly learn and execute complex human-object interaction skills in real-world settings through a novel motion data augmentation scheme and an automatic reward learner.

Dayang Liang, Yuhang Lin, Xinzhe Liu, Jiyuan Shi, Yunlong Liu, Chenjia Bai

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Imagine teaching a robot to do chores, like picking up a heavy box or pushing a cart. In the past, robots were great at walking or dancing on their own, but the moment they had to touch something and move it, they would often drop it, knock it over, or get stuck. They lacked the "feel" for how objects behave in the real world.

This paper introduces InterReal, a new "training camp" for humanoid robots designed to master these tricky tasks. Think of it as a three-step recipe to turn a clumsy robot into a skilled mover.

1. The "What-If" Simulator (Motion Augmentation)

Imagine you are learning to catch a ball. If you only practice with the ball thrown from the exact same spot every time, you'll get good at that specific throw, but you'll fail if someone throws it slightly to the left.

The researchers realized that real-world robots fail because sensors get noisy, or the box isn't exactly where the robot expects it to be. To fix this, InterReal uses a technique called Motion Augmentation.

  • The Analogy: Think of a dance instructor teaching a student. Instead of just showing one perfect routine, the instructor says, "Okay, now imagine the music is slightly faster," or "Now, pretend the floor is slippery," or "Now, the partner is standing two inches to the left."
  • How it works: The system takes a perfect video of a human moving a box and mathematically "shuffles" the box's position slightly in thousands of different ways. It then forces the robot to figure out how to adjust its arm joints to still grab the box perfectly, even if the box moved. This teaches the robot to be flexible rather than rigid.

2. The "Smart Coach" (Automatic Reward Learning)

In robot training, you have to give the robot points (rewards) for doing things right. Usually, humans have to manually decide: "Okay, if the robot holds the box, give 5 points. If it drops it, minus 10." This is like a teacher trying to grade a student by hand-cranking a calculator for every single test question. It's slow, and the teacher might get the balance wrong (e.g., caring too much about walking fast and not enough about holding the box).

InterReal introduces an Automatic Reward Learner, which acts like a Smart Coach.

  • The Analogy: Imagine a coach who watches the robot train and says, "Right now, the robot is wobbling, so let's focus the points on balance. But now that it's stable, let's focus the points on gripping the box tightly."
  • How it works: The system uses a "Meta-Policy" (a higher-level brain) that watches the robot's mistakes. If the robot is struggling to keep the box steady, the coach automatically shifts the reward system to prioritize stability. If the robot is stable but missing the box, the coach shifts the focus to accuracy. It constantly re-tunes the "grading rubric" in real-time to help the robot learn faster and better.

3. The "Real-World Test" (Deployment)

Most robot training happens in a video game world (simulation) where physics are perfect. The problem is, real life is messy.

InterReal was tested on a real robot called the Unitree G1.

  • The Scenario: The robot had to pick up a heavy box and walk with it, and then push a box forward while bending over.
  • The Result: While other robots (the "baselines") would often drop the box or fall over when the box moved slightly, InterReal adjusted on the fly. It used a camera system to see where the box actually was and tweaked its movements instantly, just like a human would.

The Big Picture

Think of InterReal as the difference between a parrot and a human.

  • Old Robots (Parrots): They memorized a specific script. If you changed the script (moved the box), they couldn't adapt.
  • InterReal (Human): It learned the principles of physics and interaction. It understands that if the box moves, I need to move my hand. It learned to adapt to chaos, making it ready for real-world jobs like warehouse work or helping people at home.

In short: InterReal teaches robots to be adaptable by practicing with "what-if" scenarios and using a smart coach that knows exactly what to praise at every moment, resulting in a robot that can actually handle real-world messiness.