Mixed-Initiative Dialog for Human-Robot Collaborative Manipulation

This paper presents MICoBot, a mixed-initiative dialog system for human-robot collaborative manipulation that dynamically allocates task steps between agents based on natural language negotiation and capability assessments, demonstrating significantly improved task success and user experience in physical trials compared to baseline models.

Albert Yu, Chengshu Li, Luca Macesanu, Arnav Balaji, Ruchira Ray, Raymond Mooney, Roberto Martín-Martín

Published 2026-03-02
📖 4 min read☕ Coffee break read

Imagine you are hosting a dinner party, and you've invited a robot friend to help you cook.

In the old days, robots were like obedient butler bots: you had to give them a strict list of instructions ("Chop the onions," "Turn on the stove"), and they would try to do exactly that. If you asked them to do something they couldn't do (like "cut the steak with a plastic spoon"), they would just keep trying until they broke something, or they would freeze. They never said, "Hey, I can't do that, but I can get you a knife."

MICoBot (Mixed-Initiative Collaborative Robot) is different. Think of MICoBot not as a butler, but as a smart kitchen partner.

The Core Idea: A Two-Way Conversation

The paper introduces a system where both you and the robot can take the lead in the conversation.

  • You can say, "Hey, can you grab the scissors?"
  • The Robot can say, "I can grab the scissors, but I can't cut the package. Can you do that part?"

This is called Mixed-Initiative. It means neither of you is stuck in a "boss vs. worker" role. You are a team negotiating who does what based on who is better at the job right now.

How MICoBot Thinks (The Three-Layer Brain)

To make this work, MICoBot uses a three-step thinking process, like a manager, a strategist, and a worker all in one:

  1. The Manager (Meta-Planner): This part listens to your conversation. If you say, "I'm tired today," the Manager updates the plan. It writes a little piece of computer code that says, "Okay, since the human is tired, let's have the robot do more heavy lifting." It adapts the rules of the game in real-time.
  2. The Strategist (Planner): This part looks at the to-do list. It asks two questions:
    • Can the robot do this? (It checks a "skill database" built from thousands of simulations).
    • Is the human willing to help? (It listens to your tone. If you sound grumpy or busy, it knows you might say "no," so it tries to do the task itself or finds a different way).
    • It then calculates the best split: "I'll do the heavy lifting, you do the delicate cutting."
  3. The Worker (Action Executor): This is the part that actually moves the robot's arms or speaks to you. If the plan says "Robot brings scissors," the Worker moves the robot. If the plan says "Robot asks for help," the Worker generates a polite sentence like, "Could you open this for me?"

The Real-World Test: The "Party Prep" Challenge

The researchers tested this with 18 real people and a robot arm in a fake apartment. They gave them three messy tasks:

  1. Pouring a package: Bringing a bowl and package, cutting it open, and pouring it. (Robots are bad at cutting; humans are good).
  2. Assembling a toy car: Bringing parts, drilling wheels, and screwing things together. (Robots are bad at fine motor skills like drilling; humans are good).
  3. Packing a gift box: Folding boxes, wrapping ribbons, and taping bows. (Robots are bad at delicate ribbon work).

The Results:

  • The Old Way (LLM Baseline): A standard AI chatbot tried to be the boss. It often tried to do things it couldn't do (like cutting the package), failed, and the whole task fell apart. Success rate: 28%.
  • The MICoBot Way: MICoBot realized, "I can't cut this. I'll ask my human partner." It negotiated, adapted when the human was busy, and took over when the human was tired. Success rate: 78%.

Why It Matters

Think of MICoBot as the difference between a scripted video game character and a real-life teammate.

  • Scripted Character: "I will follow your orders until I crash."
  • Real Teammate: "I see you're struggling with that box. Let me hold it while you tape it. Or, if you're busy, I'll try to do it myself, but I might need a hand."

The paper proves that for robots to be truly helpful in our homes, they need to stop just "listening" and start talking back, negotiating, and understanding that humans are unpredictable, sometimes tired, and sometimes very willing to help. MICoBot is the first system to master this dance of "who does what" using natural conversation.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →