Omni-Manip: Beyond-FOV Large-Workspace Humanoid Manipulation with Omnidirectional 3D Perception

This paper presents Omni-Manip, an end-to-end LiDAR-driven visuomotor policy that leverages a Time-Aware Attention Pooling mechanism to process 360° panoramic point clouds, enabling humanoid robots to perform robust dexterous manipulation in large, cluttered workspaces without the need for frequent repositioning or reliance on narrow-field-of-view RGB-D cameras.

Pei Qu, Zheng Li, Yufei Jia, Ziyun Liu, Liang Zhu, Haoang Li, Jinni Zhou, Jun Ma

Published 2026-03-06
📖 4 min read☕ Coffee break read

Imagine you are trying to clean up a messy room, but you are wearing a pair of goggles that only let you see a tiny circle directly in front of your nose. If a toy is on the floor to your left, you can't see it. If a chair is behind you, you don't know it's there. To get the toy, you have to stop, spin your whole body around, look, and then try again. This is exactly the problem current humanoid robots face when they try to do tasks in messy, real-world environments.

This paper introduces Omni-Manip, a new way to teach robots how to "see" and move that solves this problem. Here is the breakdown using simple analogies:

1. The Problem: The "Tunnel Vision" Robot

Most robots today use cameras (like RGB-D sensors) that act like a flashlight in a dark room. They only light up what is directly in front of them.

  • The Issue: If a robot needs to pick up a cup that is behind it, or if there is a chair to its side, the robot is "blind" to it.
  • The Consequence: The robot has to constantly stop, shuffle its feet, turn its head, and re-orient itself just to find the object. This is slow, clumsy, and risky because it might bump into things while turning around.

2. The Solution: The "360° Super-Vision"

The researchers gave the robot a new set of eyes: a LiDAR sensor (a laser scanner) mounted on its head.

  • The Analogy: Instead of a flashlight, imagine the robot is wearing night-vision goggles that see in a perfect 360-degree circle, like a security camera that sees everything around the building at once.
  • The Benefit: The robot can instantly "see" a bottle on a table behind it, a chair to its left, and a door to its right, all at the same time. It doesn't need to turn its body to know what's around it.

3. The Brain: "Time-Aware Attention"

LiDAR data is a bit tricky. It's like a swarm of millions of tiny, invisible dots floating in the air. Sometimes the dots flicker or move slightly because the robot is breathing or the air is moving.

  • The Innovation: The team created a special brain filter called Time-Aware Attention Pooling.
  • The Analogy: Imagine you are trying to hear a friend in a noisy crowd. If you just listen to one split-second of sound, you might hear a cough or a car horn instead of your friend. But if you listen to a few seconds of sound and focus on the voice that stays consistent, you understand them perfectly.
  • How it works: The robot looks at the "dots" (point clouds) from the last few moments, not just the current one. It smooths out the noise and focuses on the most important parts, creating a stable, clear picture of the world.

4. The Teacher: "The Teleoperation Suit"

To teach the robot this new skill, they needed a lot of practice data. But you can't just tell a robot "move your arm here" easily.

  • The Setup: They built a system where a human operator wears a VR headset and holds controllers (like a Meta Quest 3).
  • The Analogy: Think of it like a digital puppet show. The human moves their arms and body in the real world, and the robot mimics them perfectly in real-time. Because the human is wearing the VR headset, they have the same "360-degree view" as the robot's LiDAR. This allows the human to teach the robot how to coordinate its whole body (legs, waist, and arms) to reach things without bumping into furniture.

5. The Results: The "Smooth Operator"

The researchers tested this robot in a messy room with obstacles.

  • Old Robots (The Flashlight): They kept missing objects behind them or crashing into chairs because they couldn't see them. They failed almost every time the task required looking "out of view."
  • Omni-Manip (The 360° Robot): It moved smoothly. It knew exactly where the chair was behind it, so it didn't bump into it. It reached for objects behind its back without ever turning its body. It was like a dancer who knows exactly where every person in the room is, even without turning their head.

Summary

Omni-Manip is like giving a robot a superpower: the ability to see everything around it at once, combined with a brain that remembers what it saw a second ago to stay steady. This allows humanoid robots to finally stop shuffling around clumsily and start working efficiently in the messy, unpredictable world we live in.