Push Anything: Single- and Multi-Object Pushing From First Sight with Contact-Implicit MPC

This paper introduces Consensus Complementarity Control Plus (C3+), an enhanced contact-implicit model predictive control algorithm that enables a complete robotic pipeline to robustly and efficiently push diverse single and multi-object configurations to target poses in real-time, achieving a 98% success rate on hardware.

Hien Bui, Yufeiyang Gao, Haoran Yang, Eric Cui, Siddhant Mody, Brian Acosta, Thomas Stephen Felix, Bibit Bianchini, Michael Posa

Published 2026-03-09
📖 5 min read🧠 Deep dive

Imagine a robot arm that doesn't just pick things up and move them (like a human grabbing a cup), but instead pushes them around like a game of air hockey or a game of pool. This is called "non-prehensile manipulation." It's incredibly hard for robots because pushing is messy: things slide, they get stuck, they bump into each other, and they might tip over.

This paper introduces a new system called "Push Anything" that teaches a robot how to push almost any object, even when there are many of them cluttered on a table, without needing to know exactly how heavy or slippery they are beforehand.

Here is the breakdown of how they did it, using some everyday analogies:

1. The Problem: The "Local Minima" Trap

Imagine you are trying to push a heavy box across a room to a specific spot. You are standing right next to it. If you just push it straight, it might get stuck against a wall.

  • Old Robots: They were like people who only looked at the immediate inch in front of their nose. They would push the box, hit a wall, get stuck, and give up. They couldn't see that if they walked around to the other side of the box and pushed it from there, the whole puzzle would solve itself.
  • The Challenge: In a room full of furniture (clutter), figuring out the right angle to push a specific item so it slides past three other items is a math nightmare. The number of possibilities explodes.

2. The Solution: The "Smart Scout" Strategy

The authors combined two ideas to solve this:

A. The Scout (Sampling)
Instead of just pushing from where the robot arm currently is, the system acts like a scout. It quickly imagines, "What if I walked over to this spot on the table and pushed from there? Or that spot?"

  • It picks a few random "good spots" to stand in.
  • It checks: "If I stand here, can I push the object to the goal?"
  • It picks the best spot, walks there (without touching anything), and then starts pushing. This helps the robot escape the "traps" where it would get stuck.

B. The Brain (C3+ Algorithm)
Once the robot is in the right spot, it needs to figure out exactly how to push. This is where the new algorithm, C3+, comes in.

  • The Old Way (C3): Imagine trying to solve a giant, tangled knot of string. The old method tried to untie the whole knot at once. It was slow and often got stuck.
  • The New Way (C3+): The new method is like having a pair of scissors. It cuts the knot into tiny, separate pieces. It solves each tiny piece instantly (using a simple math trick) and then stitches them back together.
  • The Result: This makes the robot's brain 10,000 times faster at thinking. It can now handle complex scenarios with 4 or more objects moving around each other in real-time, which was previously impossible.

3. The "Eyes" (Perception)

Before pushing, the robot needs to know what it's looking at.

  • The Process: The robot takes a video of the objects. It uses AI to trace their outlines (like a digital artist tracing a photo) and builds a 3D model of them, even if they are weird shapes like a letter "R" or a bottle of lotion.
  • The Tracking: As the robot pushes, the objects move and might hide behind each other. The system is smart enough to keep track of them, like a referee in a game of tag who never loses sight of the players, even when they run behind a tree.

4. The Results: Real-World Success

The team tested this on a real robot arm (a Franka Panda) with 33 different objects, from 3D-printed letters to household items.

  • Success Rate: It worked 98% of the time.
  • Speed:
    • Moving 1 object: ~30 seconds.
    • Moving 2 objects: ~1.5 minutes.
    • Moving 3 objects: ~3 minutes.
    • Moving 4 objects: ~5 minutes.
  • The "Push Anything" feat: They successfully cleared a table of 4 different objects, rearranging them into a neat line, something that would have confused previous robots.

The Big Picture

Think of this paper as teaching a robot the art of billiards.

  • Old robots could only hit the cue ball straight at the target. If the target was blocked, they failed.
  • This new robot looks at the whole table, calculates the angles, realizes it needs to hit the cue ball into the cushion first to bounce it around the obstacles, and then sink the target. It does this fast enough that it can play the game in real-time, even with a crowded table.

In short: They built a robot that can look at a messy table, figure out the best way to push things around to clean it up, and actually do it without dropping anything or getting stuck.