IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models

The paper proposes IMPACT, a novel motion planning framework that leverages Vision-Language Models to infer environment semantics and generate anisotropic cost maps, enabling a contact-aware A* planner to safely navigate cluttered environments by distinguishing between acceptable and dangerous object contacts.

Yiyang Ling, Karan Owalekar, Oluwatobiloba Adesanya, Erdem Bıyık, Daniel Seita

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Imagine you are trying to grab a jar of spices from the very back of a messy kitchen cabinet. The shelf is packed tight with a heavy glass vase, a stack of fragile bowls, and a soft, squishy teddy bear.

The Old Way (Traditional Robots):
A traditional robot is like a nervous, rule-abiding librarian. Its only rule is: "Do not touch anything."
To get the spice jar, the librarian-robot would try to find a path that weaves perfectly between the objects without brushing against a single one. If the shelf is too crowded, the robot gives up, saying, "Impossible! I can't get there without breaking something." It might try to lift its arm high over the clutter, but in a tight cabinet, there's no room to go up.

The New Way (IMPACT):
The paper introduces IMPACT, a robot that acts more like a clever, experienced human moving through a crowded room. It knows that sometimes, to get what you need, you have to nudge things out of the way. But it also knows the difference between a soft pillow and a crystal vase.

Here is how IMPACT works, broken down into simple steps:

1. The "Common Sense" Brain (Vision-Language Models)

First, IMPACT looks at the messy shelf and asks a super-smart AI (called a Vision-Language Model, or VLM) for advice. Think of this AI as a wise grandparent who has seen thousands of objects.

  • The AI looks at the wine glass and says, "That's fragile! Give it a high 'danger score'."
  • It looks at the teddy bear and says, "That's soft and squishy. Give it a low 'danger score'."
  • It looks at the spice jar (the goal) and says, "That's the target! Give it a negative score (a reward)."

2. The "Push Map" (Anisotropic Cost Map)

This is the clever part. Just knowing an object is "safe" isn't enough; you need to know how to push it.
Imagine the teddy bear is a heavy box. If you push it from the side, it might slide nicely into a corner. If you push it from the front, it might tip over and knock over the wine glass.
IMPACT creates a special 3D map that doesn't just say "Teddy Bear = Safe." It says:

  • "Pushing the bear from the left is safe."
  • "Pushing the bear from the right is risky."
  • "Pushing the vase in any direction is a disaster."

This map is called "anisotropic," which is a fancy way of saying the safety depends on the direction you are coming from.

3. The "Smart Navigator" (Contact-Aware A*)

Now, the robot uses a GPS-like planner to find a route.

  • Traditional robots try to draw a straight line that never touches anything.
  • IMPACT draws a path that says: "I will gently nudge the teddy bear to the left (because the map says that's safe), slide past the wine glass without touching it, and grab the spice jar."

It calculates the "cost" of every move. Pushing the bear costs very little. Hitting the vase costs a million points. The robot finds the path with the lowest total cost, even if that path involves a little bit of contact.

Why This Matters

In the real world, things are rarely perfectly organized.

  • Old robots get stuck in cluttered rooms because they are too afraid to touch anything.
  • IMPACT is like a person who can shuffle a pile of laundry to get to the shirt underneath, or slide a couch slightly to walk past it, without knocking over the lamp.

The Results

The researchers tested this in computer simulations and with a real robot arm in a lab.

  • Success Rate: IMPACT successfully grabbed the target objects much more often than the "no-touch" robots.
  • Human Preference: When humans watched videos of the robots, they preferred IMPACT. They felt the robot was being "smart" and "gentle" rather than clumsy or overly cautious.
  • Safety: It successfully avoided breaking fragile items (like the wine glass) while moving the soft ones (like the teddy bear).

In a Nutshell

IMPACT teaches robots to stop being afraid of touching things and start using common sense. It understands that not all collisions are bad; some are just a necessary part of getting the job done, as long as you know what to touch and how to push it. It turns a messy, impossible task into a manageable one by knowing the difference between a "soft nudge" and a "hard crash."