Moving Through Clutter: Scaling Data Collection and Benchmarking for 3D Scene-Aware Humanoid Locomotion via Virtual Reality

This paper introduces Moving Through Clutter (MTC), an open-source Virtual Reality framework that addresses the lack of data for scene-aware humanoid locomotion by procedurally generating diverse 3D cluttered environments, capturing whole-body human motion, and providing a benchmarked dataset of 348 trajectories to advance robot navigation in complex, real-world settings.

Beichen Wang, Yuanjie Lu, Linji Wang, Liuchuan Yu, Xuesu Xiao

Published 2026-03-09
📖 4 min read☕ Coffee break read

Imagine you are teaching a robot to walk. So far, scientists have been very good at teaching robots to walk on empty, flat dance floors. They can make them run, jump, and even do parkour, but only if the room is completely empty.

But real life isn't an empty dance floor. Real life is a messy living room filled with coffee tables, piles of laundry, low-hanging lamps, and narrow hallways. If you put a robot in there, it would likely trip, crash into the sofa, or get stuck because it doesn't know how to squeeze through tight spots or duck under obstacles.

This paper introduces a new project called MTC (Moving Through Clutter) to solve exactly that problem. Here is how they did it, explained simply:

1. The Problem: The "Empty Room" Trap

Most robot training data comes from humans walking in big, empty studios. It's like teaching a driver to drive only in a parking lot with no other cars. When you finally put that driver on a busy city street with narrow alleys and parked cars, they panic. Robots are the same; they don't know how to adapt their bodies to squeeze through a cluttered house.

2. The Solution: The "Virtual Reality Simulator"

Instead of building hundreds of messy rooms in the real world (which would be expensive and take forever), the team built a Virtual Reality (VR) video game.

  • The Game Master (Procedural Generation): They wrote a computer program that acts like a chaotic interior designer. It randomly builds rooms, filling them with furniture, debris, and obstacles. It can make a room slightly messy or extremely crowded, just like a real house.
  • The Player (The Human): A human puts on a VR headset and walks through these digital, messy rooms. They have to dodge chairs, duck under beams, and squeeze through narrow gaps, just like they would in real life.
  • The Magic Trick (Embodiment Scaling): This is the clever part. The robot they are training might be shorter or taller than the human. So, the VR system shrinks or stretches the virtual world to match the robot's size. If the robot is short, the human sees the room as "tall" and has to crouch more. This ensures the human's movements are perfectly sized for the robot.

3. The Result: A "Driving School" for Robots

The system records the human's movements in VR and translates them into data the robot can understand.

  • The Dataset: They created a library of 348 different "walks" through 145 different messy rooms. It's like a massive textbook of "how to walk through a mess."
  • The Test (The Benchmark): They didn't just collect data; they built a grading system. They check two things:
    1. Did you crash? (Collision Safety): Did the robot hit the furniture?
    2. Did you adapt? (Adaptation Score): Did the robot change its walk? (e.g., Did it lift its knees higher? Did it lean sideways? Did it crouch?)

4. Why This Matters

Think of this like giving a robot a Gym Membership.

  • Before: Robots trained in a gym with no equipment. They were strong but clumsy in the real world.
  • Now (With MTC): Robots are training in a gym filled with obstacles, ropes, and uneven floors. They are learning to twist, turn, and balance in tight spaces.

The paper shows that when robots learn from this new "messy room" data, they become much better at navigating real homes and offices without crashing. It moves robot walking from "perfect conditions" to "real-world chaos."

In a nutshell: The authors built a VR simulator where humans walk through messy digital rooms. They recorded how humans naturally adjust their bodies to avoid hitting things, scaled that data to fit a robot, and created a massive dataset to teach robots how to walk through clutter without falling over.