Influence-Based Reward Modulation for Implicit Communication in Human-Robot Interaction

This paper proposes a method to foster implicit communication in human-robot interaction by modulating inter-agent influence through Transfer Entropy within a reward framework, demonstrating that enhancing influence improves collaboration while resisting it promotes independence, as validated through simulations and real-world experiments.

Haoyang Jiang, Elizabeth A. Croft, Michael G. Burke

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Here is an explanation of the paper, translated into simple language with some creative analogies.

The Big Idea: The "Silent Dance" of Robots and Humans

Imagine you are walking down a crowded hallway. You see someone coming toward you. Without saying a word, you both subtly shift your hips or tilt your head, and suddenly, you both know exactly who is going to step left and who is going to step right. You pass each other smoothly. No shouting, no hand signals—just a silent, intuitive understanding.

This is implicit communication. It's the "vibe" or the "flow" between people.

This paper asks a big question: Can we teach robots to do this? Not by programming them with complex rules about human psychology, but by teaching them to "feel" the flow of information between them and a human.

The Problem: Robots Are Too Clueless

Currently, for a robot to understand a human, it usually needs a "cheat sheet." It needs to know: "If the human looks left, they probably want to go left." Or it needs to build a complex mathematical model of the human's brain.

The problem is, humans are messy. We don't always follow rules, and we don't always have a "cheat sheet" for every situation. The authors wanted a way for robots to learn to communicate without needing to know exactly what the human is thinking or having a pre-written rulebook.

The Solution: The "Influence Score" (Transfer Entropy)

The authors came up with a clever trick using a math concept called Transfer Entropy.

Think of Transfer Entropy as a "Silent Influence Score."

  • High Score: It means "My actions are strongly affecting what you do next." (e.g., I step left, and you immediately step right because of me).
  • Low Score: It means "We are ignoring each other." (e.g., I step left, and you keep walking straight because you didn't notice me).

The paper proposes adding this "Score" to the robot's reward system. Instead of just getting points for "reaching the goal," the robot gets extra points for making its actions influence the human (or resisting that influence, depending on the situation).

The Two Modes: The "Good Cop" and the "Bad Cop"

The researchers tested two different ways to use this score:

1. The "Good Cop" (Boosting Influence)

Goal: Collaboration and Transparency.
How it works: The robot is rewarded for making its moves so clear that the human can't help but react to them.

  • The Analogy: Imagine a dance partner who moves so clearly that you instinctively know where to step. They aren't forcing you; they are just so legible that you fall into sync.
  • The Result: In experiments (like a virtual hallway game), when the robot tried to boost this influence, humans became better at collaborating. They figured out the robot's intentions faster and worked together more smoothly. The robot became "legible."

2. The "Bad Cop" (Resisting Influence)

Goal: Independence or Competition.
How it works: The robot is rewarded for ignoring the human's influence. It tries to be a "wall" that the human cannot push.

  • The Analogy: Imagine a stubborn mule that refuses to move even if you pull the rope. It resists your influence.
  • The Result: When the robot did this, humans found it harder to predict what the robot would do. Collaboration dropped. However, in a competitive scenario (where the robot and human were racing against each other), the robot became more "selfish" and independent, which sometimes helped it win, but made the interaction feel less cooperative.

The Experiments: From Video Games to Real Robots

The team didn't just talk about this; they tested it in three ways:

  1. The Grid World (Video Game): Two dots on a screen had to decide whether to meet or pass each other in a narrow corridor.
    • Finding: The "Good Cop" robots helped humans win more often in team games. The "Bad Cop" robots made humans struggle to coordinate.
  2. The Virtual Human: Real people played the game against a computer robot.
    • Finding: People felt the "Good Cop" robot was more human-like and easier to understand, even though they couldn't consciously explain why. They just felt the flow was better.
  3. The Real Robot: They put a physical robot (a Fetch robot) in a real hallway with real humans.
    • Finding: The results held up! When the robot tried to be "influential," humans walked more cooperatively. When it tried to be "independent," humans had a harder time coordinating.

The Wild Card: The Highway Test

They also tested this on a self-driving car simulation (the "Highway" environment).

  • The Twist: Here, "boosting influence" made the car too aggressive. It started driving faster and cutting closer to other cars to "influence" them to move.
  • The Lesson: Sometimes, you don't want to be too influential. On a highway, you want to be predictable and safe, not a social butterfly trying to force a dance. This shows that the robot needs to know when to be a "Good Cop" and when to be a "Bad Cop."

The Takeaway

This paper is like teaching a robot to listen to the room rather than just following a script.

  • Without this method: A robot is like a person shouting instructions in a foreign language. It's loud, confusing, and annoying.
  • With this method: The robot is like a skilled dancer. It adjusts its moves based on the flow of the room. It doesn't need to know your name or your history; it just needs to feel how its moves change your next step.

By simply tweaking the robot's "reward" to care about how much it influences you, the robot learns to communicate without saying a word. It's a step toward robots that feel less like machines and more like natural partners in our daily lives.