Indicating Robot Vision Capabilities with Augmented Reality

Imagine you are working on a puzzle with a robot partner. You both need to build a model airplane. You ask the robot, "Can you hand me that red screwdriver?"

Here is the problem: You think the robot sees the world exactly like you do. You have eyes on the sides of your head and can see almost 180 degrees around you. But the robot? It has a camera that only sees a narrow tunnel in front of it, maybe 54 degrees wide.

Because you don't realize this "tunnel vision," you might ask the robot to grab something that is actually sitting right behind its shoulder, completely invisible to it. The robot will be confused, or worse, it will try to grab it and fail, wasting time and making you both frustrated.

This paper is about teaching humans to see the world through the robot's "eyes" so we don't ask it to do the impossible.

The Big Idea: "Painting" the Robot's Vision

The researchers wanted to show humans exactly what the robot can and cannot see. Since you can't easily change the robot's physical hardware (you can't easily widen its camera lens), they used Augmented Reality (AR).

Think of AR like a pair of smart glasses (like the HoloLens they used). When you look at the robot through these glasses, you see digital "paintings" or overlays that aren't physically there but look like they are.

They tested four different ways to draw this invisible "vision tunnel" for the human to see:

The "Deep Eye Socket" (Egocentric):
- The Metaphor: Imagine the robot's eyes are like a cave. The researchers used AR to make the robot's eye sockets look incredibly deep, like a dark tunnel.
- The Logic: Just like you can't see behind your own head because your skull blocks it, the deep cave shows the human, "Hey, the robot can't see past this deep hole." It mimics the robot's physical limitation.
The "Side Blocks" (Egocentric):
- The Metaphor: Imagine putting two giant cardboard boxes right next to the robot's eyes, blocking the sides.
- The Logic: This physically (virtually) blocks the view, showing the human, "The robot can't see past these boxes."
The "Long Arms" (Transition Space):
- The Metaphor: Imagine the robot has two long, invisible arms stretching from its eyes all the way to the table where the tools are.
- The Logic: This connects the robot's head directly to the work area, showing a clear "cone" of vision reaching out to the objects.
The "Table Walls" (Allocentric):
- The Metaphor: Instead of drawing on the robot, they drew two invisible walls directly on the table where the tools are sitting.
- The Logic: This shows the human, "If the tool is inside these walls, the robot sees it. If it's outside, the robot is blind to it."

What Did They Find? (The Results)

They had 41 people play a game assembling an airplane with a robot using these different "paintings." Here is what happened:

The "Table Walls" (Allocentric) were the most accurate.
- Analogy: It's like putting a "Do Not Enter" sign directly on the door you are trying to open, rather than pointing at the door from across the room. When the indicator was right on the table, people guessed correctly 95% of the time. They knew exactly what the robot could see.
- The Catch: It took people a tiny bit longer to figure out how the walls connected to the robot's eyes. It was like solving a small puzzle before acting.
The "Deep Eye Socket" was a close second.
- Analogy: This was surprisingly effective (85% accuracy). It felt natural, like looking into a deep well. It worked so well that the researchers suggest: If you can't use AR, just build robots with deeper eye sockets!
The "Long Arms" (Extended Blocks) were tricky.
- Analogy: People saw the "cone" shape and thought, "Oh, the robot can see everything inside this cone." But because of how the AR glasses work (they are see-through), people could still see the tools through the virtual cone. They got confused and thought the robot could see things it actually couldn't. Also, people who guessed wrong with this design were overconfident in their wrong answers.
The "Side Blocks" (Near-Eye) didn't help much.
- Analogy: Just putting blocks next to the eyes didn't help people understand the distance to the table. They still didn't know if the tool on the table was inside or outside the vision.

The Takeaway for Robot Designers

The researchers came up with six simple rules (guidelines) for anyone building robots that work with humans:

Deepen the eyes: If you can't use AR, make the robot's eyes look like deep sockets. It naturally tells humans, "I can't see sideways."
Paint the table: If you can use AR, draw the vision limits directly on the work surface (the table). It's the most accurate way to communicate.
Connect the dots: If you draw lines on the table, make sure they clearly connect back to the robot's eyes so people don't get confused about where the vision starts.
Watch out for overconfidence: If you use the "cone" shape, be careful. People might think they know exactly what the robot sees, even when they are wrong.
Don't worry about stress: Even though the "Table Walls" took a tiny bit longer to understand, it didn't make people feel stressed or tired. It was easy to use.
Safety first: For critical jobs (like surgery or heavy lifting), always use the "Table Walls" method. Accuracy is more important than speed.

In Summary

Humans are bad at guessing what robots can see because we assume robots are like us. This paper shows that by using simple visual tricks (like drawing invisible walls on a table or deepening the robot's eyes), we can fix this misunderstanding. It makes teamwork smoother, faster, and much less frustrating for both the human and the robot.

Here is a detailed technical summary of the paper "Indicating Robot Vision Capabilities with Augmented Reality" published in the International Journal of Social Robotics (2026).

1. Problem Statement

In human-robot collaboration (HRC), humans often form inaccurate mental models regarding a robot's capabilities, specifically its Field of View (FoV).

The Discrepancy: Humans typically possess a horizontal FoV of over 180°, whereas standard robot cameras (e.g., on the Pepper robot) often have a narrow horizontal FoV (e.g., ~54.4°).
The Consequence: Humans frequently assume robots can see objects that are actually outside the robot's visual range. This leads to "impossible" requests (e.g., asking a robot to hand over an object it cannot see), causing task failures, unnecessary explanations, and confusion.
The Challenge: While robots could theoretically scan their environment to update their world model, doing so dynamically during a task can cause confusion (if they look the wrong way first) or introduce latency. Therefore, a proactive method to communicate vision limitations is required to align human expectations with robot reality.

2. Methodology

2.1 Design Taxonomy and Indicators

The authors proposed a spectrum of Field-of-View (FoV) indicators ranging from egocentric (attached to the robot's body) to allocentric (placed in the task environment). They selected four specific designs implemented via Augmented Reality (AR) on a Pepper robot:

Eye Sockets (Egocentric): An AR overlay that deepens the robot's eye sockets to visually simulate the physical limitation of the camera's angle (calculated to match the 54.4° FoV).
Near-Eye Blocks (Egocentric): Virtual blocks added directly to the sides of the robot's eyes to functionally block the view outside the FoV.
Extended Blocks (Transition Space): Virtual blocks extending from the robot's eyes to the task table, visually connecting the robot's head to the environment to show the visible range.
Blocks at Task (Allocentric): Virtual blocks placed directly on the task table (without connecting to the robot) to delineate the visible area within the workspace.

2.2 Experimental Setup

Participants: $N = 41$ (University students/staff; 66% male, 34% female; mean age 21).
Platform: Pepper robot (Aldebaran) equipped with a Microsoft HoloLens 2 for the human participant.
Task: A collaborative assembly task where participants built an airplane model. They had to request specific tools from the robot. If they believed the robot could see the tool, they asked the robot to hand it over; otherwise, they retrieved it themselves.
Conditions: A mixed-design study with 5 conditions:
- Baseline: No indicator.
- Within-Subjects: Eye Sockets, Near-Eye Blocks, Extended Blocks (counterbalanced).
- Between-Subjects: Blocks at Task (participants experienced only one allocentric design to avoid learning effects).
Metrics:
- Accuracy: Percentage of correct guesses regarding whether an object was within the robot's FoV.
- Efficiency: Task completion time (time taken to decide and request/retrieve).
- Confidence: 7-point Likert scale rating of the participant's certainty.
- Cognitive Load: NASA-TLX (Task Load Index).

2.3 Data Analysis

The study utilized a Bayesian analysis framework (using JASP) rather than traditional Frequentist statistics. This allowed the authors to quantify evidence for or against the null hypothesis (e.g., using Bayes Factors to determine if differences were anecdotal, moderate, or strong).

3. Key Contributions

Taxonomy of FoV Indicators: Established a spectrum categorizing vision indicators from egocentric (robot-centric) to allocentric (environment-centric).
AR Implementation: Designed and implemented four distinct AR indicators on a physical robot, demonstrating how virtual alterations can convey physical limitations.
Empirical Evidence: Provided data on how different spatial placements of indicators affect human mental models, accuracy, and efficiency in a collaborative setting.
Design Guidelines: Derived six actionable guidelines for practitioners to improve robot transparency.

4. Key Results

4.1 Accuracy (Mental Model Alignment)

Baseline: 66% accuracy (significant misunderstanding of FoV).
Eye Sockets: 85% accuracy. Surprisingly effective, likely because deepened sockets mimic human anatomy, allowing users to intuitively project their own vision limits onto the robot.
Near-Eye Blocks: 71% accuracy (similar to baseline).
Extended Blocks: 81% accuracy.
Blocks at Task: 95% accuracy (Highest). Placing indicators directly in the task space yielded nearly perfect understanding of what the robot could see.
Note: A specific misconception was found with "Extended Blocks," where some users interpreted the triangular AR panels as 3D cones, leading to errors.

4.2 Task Efficiency (Completion Time)

Extended Blocks was the fastest (Mean $\approx$ 6.55s).
Blocks at Task was the slowest (Mean $\approx$ 11.42s).
Reasoning: Users spent extra time mentally connecting the "Blocks at Task" (which were detached from the robot) back to the robot's eyes to understand the relationship. Extended Blocks, being physically connected to the robot, required less cognitive bridging.

4.3 Confidence and Workload

Confidence: Generally high across all conditions (5.3–6.2/7). No statistically significant differences were found between designs, though users who made errors with "Extended Blocks" showed signs of overconfidence.
Cognitive Load: Remained low across all conditions (NASA-TLX scores $\approx$ 20–25/100). Even the most accurate design ("Blocks at Task") did not significantly increase mental workload, despite increasing task time.

5. Significance and Design Guidelines

The study concludes that while allocentric designs (Blocks at Task) offer the highest accuracy, egocentric designs (Eye Sockets) offer a strong balance of accuracy and physical feasibility (as they can be implemented via physical modification, not just AR).

The authors propose six specific guidelines:

Physical Alteration: If AR is unavailable, design deeper eye sockets to match the camera's FoV.
Task Space Accuracy: Use AR indicators placed directly in the task space for near-perfect accuracy.
Efficiency Connection: Connect task-space indicators to the robot's eyes (e.g., Extended Blocks) to improve efficiency, though this may introduce specific misconceptions (cone effect).
Overconfidence Warning: Be aware that "Extended Blocks" may lead to overconfidence in incorrect guesses.
Workload Assurance: High-accuracy indicators do not significantly increase cognitive load, even if they slow down the task.
Mission-Critical Tasks: For tasks requiring high precision, prioritize allocentric designs (Blocks at Task) over efficiency.

Conclusion

This paper bridges the gap between robot perception and human expectation. It demonstrates that Augmented Reality is a viable and effective tool for visualizing robot limitations. The findings suggest that the most effective strategy depends on the context: allocentric indicators are best for accuracy-critical tasks, while egocentric indicators (like deepened eye sockets) are excellent for physical robots where AR is not an option or for maintaining intuitive human-robot interaction.