REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?

Here is an explanation of the paper REI-Bench, translated into simple, everyday language with some creative analogies.

The Big Problem: The Robot's "Mind Reading" Struggle

Imagine you have a super-smart robot butler. You tell it, "Please move the heavy stuff outside."

If you are standing in a kitchen where the only heavy thing is a giant pot, a human understands instantly: "Oh, they mean the pot." But for a robot, "heavy stuff" is a nightmare. Is it the pot? The bag of flour? The cast-iron pan?

The paper argues that while robots are getting great at following clear instructions (like "Move the pot"), they are terrible at following vague instructions (like "Move it" or "Move the heavy stuff"). This is a huge problem because real humans—especially the elderly, children, or people in a hurry—don't speak like robots. They use shortcuts, pronouns, and descriptions that rely on context.

The Solution: A New "Gym" for Robots (REI-Bench)

To fix this, the researchers built a new training ground called REI-Bench. Think of this as a "gym" for robot brains, but instead of lifting weights, the robots have to solve puzzles involving vague language.

They created a dataset of 2,700 scenarios based on real-life conversations. They tested the robots in three different "difficulty modes":

The "Clear" Mode: The human says, "Move the pot." (Easy peasy).
The "Mixed" Mode: The human says, "Move the pot," but then later says, "Now move it." (The robot has to remember what "it" refers to).
The "Vague & Distracting" Mode: The human says, "Move the heavy thing," while the conversation is full of noise, like mentioning a person named "Apple" (who isn't a fruit) or talking about a "heavy" book that isn't the target.

The Result: When the instructions got vague, the robots' success rate crashed. Some failed 37% more often than when the instructions were clear. They started grabbing the wrong items, like picking up a plate instead of the pot because they couldn't figure out what "the heated one" meant.

Why Do Robots Fail? (The "Distraction" Analogy)

The researchers discovered that the robots aren't "dumb"; they just get distracted.

Imagine a student taking a math test.

Clear Instruction: "Solve for X." The student focuses on the math.
Vague Instruction: "Solve for the thing that makes the answer happy."

The robot's brain (the Large Language Model) tries to do two things at once:

Understand the language (Figure out what "it" means).
Plan the actions (Pick up, move, put down).

When the language is vague, the robot gets so stuck trying to figure out the meaning that it forgets how to plan the actions. It's like a driver trying to read a map while driving; they get confused and crash. The robot spends all its "brain power" guessing the word and runs out of power to actually move the object.

The Fix: "The Translator" (TOCC)

The paper proposes a clever, simple fix called TOCC (Task-Oriented Context Cognition).

Instead of asking the robot to "Guess the meaning AND plan the move" at the same time, TOCC splits the job into two steps, like a Translator and a Manager.

Step 1: The Translator (Cognition): The robot first acts as a translator. It looks at the vague instruction ("Move the heavy stuff") and the conversation history, then rewrites it into a crystal-clear command: "Move the pot."
Step 2: The Manager (Planning): Now, the robot takes this clear command and simply plans the moves. No guessing, no confusion.

The Analogy:
Think of it like a chef and a sous-chef.

Without TOCC: The chef tries to read a scribbled note from a customer ("Make the spicy red thing") while simultaneously chopping vegetables. They chop the wrong thing.
With TOCC: The sous-chef (Translator) reads the note, asks the customer for clarification, and writes a clear ticket: "Make the Spicy Red Chili." The chef (Planner) then just follows the clear ticket perfectly.

The Takeaway

This paper teaches us that to make robots useful for real people (like grandma or a toddler), we can't just give them smarter brains. We have to teach them to translate human vagueness into clear instructions before they try to act.

By adding this "Translator" step, the researchers made the robots significantly better at understanding us, proving that sometimes, the best way to help a robot is to help it understand what we really mean.

REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?

The Big Problem: The Robot's "Mind Reading" Struggle

The Solution: A New "Gym" for Robots (REI-Bench)

Why Do Robots Fail? (The "Distraction" Analogy)

The Fix: "The Translator" (TOCC)

The Takeaway

1. Problem Statement

2. Methodology

A. REI-Bench Benchmark and Dataset

B. Proposed Solution: Task-Oriented Context Cognition (TOCC)

3. Key Contributions

4. Experimental Results

Baseline Performance

Error Analysis

TOCC Performance

5. Significance and Future Work

REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?

The Big Problem: The Robot's "Mind Reading" Struggle

The Solution: A New "Gym" for Robots (REI-Bench)

Why Do Robots Fail? (The "Distraction" Analogy)

The Fix: "The Translator" (TOCC)

The Takeaway

1. Problem Statement

2. Methodology

A. REI-Bench Benchmark and Dataset

B. Proposed Solution: Task-Oriented Context Cognition (TOCC)

3. Key Contributions

4. Experimental Results

Baseline Performance

Error Analysis

TOCC Performance

5. Significance and Future Work

More like this

One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations

MultiGraSCCo: A Multilingual Anonymization Benchmark with Annotations of Personal Identifiers

ConFu: Contemplate the Future for Better Speculative Sampling

SciTaRC: Benchmarking QA on Scientific Tabular Data that Requires Language Reasoning and Complex Computation

Automated Thematic Analysis for Clinical Qualitative Data: Iterative Codebook Refinement with Full Provenance