Responsibility and Engagement -- Evaluating Interactions in Social Robot Navigation

Imagine you are walking down a busy street. You see a robot coming toward you. You both need to get past each other without bumping into one another.

In the past, scientists evaluating robots mostly asked: "Did they crash?" or "How close did they get?"

But this paper argues that's like judging a dance only by whether the dancers tripped. It misses the art of the dance. Did the robot gracefully step aside? Did it awkwardly shove you? Or did it just stand there hoping you'd move?

This paper introduces two new "scorecards" to judge how robots and humans interact: Responsibility and Engagement.

Here is the breakdown in simple terms:

1. The Problem: The "Blame Game"

Imagine two people walking toward each other on a narrow path.

Scenario A: You both stop, look at each other, and politely step aside.
Scenario B: You both keep walking until the last second, then panic and jump out of the way.
Scenario C: You keep walking, and the other person has to jump out of the way to save themselves.

Old metrics might say, "Great! No crash happened!" in all three cases. But clearly, Scenario C is rude, and Scenario A is polite. We need a way to measure who did the work to avoid the crash.

2. The New Scorecards

🏆 Responsibility (The "Good Samaritan" Score)

This metric asks: "Who actually solved the problem?"

Think of a conflict like a rising tide of water that threatens to flood a house.

If Agent A (the robot) sees the water rising and builds a wall to stop it, Agent A gets 100% of the "Responsibility" points.
If Agent B (the human) builds the wall, Agent B gets the points.
If both chip in, they split the points.
If neither does anything and the water just recedes on its own (or they crash), then "Time" gets the points, meaning no one was helpful.

The Twist: The paper adds a new layer. It doesn't just look at the moment of the crash; it looks at the buildup. If the robot was walking straight toward you for a long time and only moved at the very last second, it gets less "Responsibility" than if it moved early and smoothly.

🔥 Engagement (The "Drama" Score)

This metric asks: "Who made things worse?"

Think of a conflict like a campfire.

Responsibility is who put water on the fire to put it out.
Engagement is who threw a log on the fire to make it bigger.

If a robot sees a human and decides to speed up, or turn sharply toward them (maybe to start a conversation, or just because it's confused), it is "Engaging" the conflict. It is adding fuel to the fire. Even if the robot eventually stops, this metric captures that moment of "oops, I made it scarier."

3. How They Tested It (The Simulations)

The authors ran computer simulations to see if their new scorecards made sense.

The "Head-On" Test: Two agents walk straight at each other.
- If one jumps aside, that one gets 100% Responsibility.
- If both jump aside, they split the points (50/50).
- If they crash, "Time" gets the points (0% for both).
- Result: The math worked perfectly. It knew exactly who moved.
The "Group Split" Test: A robot walks toward two friends (Alice and Bob) who are walking side-by-side.
- If the robot squeezes between them, it gets a high "Engagement" score because it forced them apart (making the situation tense).
- If the robot walks around the outside of the group, it gets a high "Responsibility" score because it solved the problem without disturbing the friends.
- Result: The metrics correctly identified that walking around the group was the "nicer" move.
The "Personal Space" Test: What if the robot doesn't hit you, but walks too close?
- The authors tweaked the rules to say, "Hey, 1 meter is too close!"
- Suddenly, a robot that barely missed a collision but walked right through your personal bubble got a lower "Responsibility" score. It realized that avoiding a crash isn't enough; you also need to be polite.

4. Why This Matters

Imagine you are designing a robot for a hospital or a shopping mall. You want it to be helpful, not annoying.

Old Way: You check if the robot crashes. If it doesn't, you say, "Good job!"
New Way: You check the Responsibility and Engagement scores.
- If the robot has low Responsibility, it means humans are constantly having to dodge it. That's a bad robot.
- If the robot has high Engagement, it means it's being aggressive or confusing. That's a bad robot.
- If the robot has high Responsibility and low Engagement, it means it's a polite, foresighted partner that helps keep the peace.

The Bottom Line

This paper gives us a new way to grade social robots. It moves beyond "Did they crash?" to "Who was the polite one, and who made the drama?"

By using these metrics, engineers can teach robots to be not just safe, but also socially graceful, ensuring that when a robot walks down the street, it doesn't just avoid hitting you—it respects your space and helps you feel safe.

Here is a detailed technical summary of the paper "Responsibility and Engagement – Evaluating Interactions in Social Robot Navigation."

1. Problem Statement

In Social Robot Navigation (SRN), existing evaluation metrics for robot trajectories often fail to capture the causal component of human-robot interactions. Current metrics (e.g., minimum distance, acceleration, time-to-collision) quantify that a conflict occurred or was avoided, but they do not determine which agent contributed to the escalation or resolution of that conflict.

This is a critical gap because:

Humans often compensate for a robot's lack of navigation capability, taking on the burden of collision avoidance.
As robot deployment increases, robots must be socially compliant and proactive.
Researchers need metrics to distinguish between a robot that "foresightfully" resolves a conflict versus one that forces a human to react.
Previous work (Probst & Wenzel, 2025) introduced a "Responsibility" metric but was limited to scenarios where conflicts were already at their peak and did not account for the conflict buildup phase or multi-agent groups.

2. Methodology

The authors propose an extended framework comprising two primary metrics: Responsibility ( $R$ ) and Engagement ( $E$ ). These are calculated based on the evolution of a "Conflict" measure over time.

A. Core Definitions

Conflict Potential ( $CP$ ): Based on the predicted Distance of Closest Encounter (pDCE). If two agents continue on their current constant-velocity trajectories, $CP$ is 1 if they are on a direct collision course and 0 if they will miss each other.
Conflict ( $C$ ): To address the limitation that potential conflict exists long before agents react, the authors introduce a time normalization.
- They define a "weighted window" ( $N_W$ ) ending at the Time of Closest Encounter (TCE).
- The Conflict $C(t)$ is the product of $CP(t)$ and a linear normalization term $N(t)$ , which is zero before $TCE - N_W$ and rises to 1 at TCE. This ensures the metric only evaluates the phase where agents could have reacted.
Agent Behavior Change: The metric assumes a "reference behavior" ( $B_{ref}$ ) of constant velocity. Any deviation from this (acceleration, turning) is considered an interaction-related change.

B. The Metrics

The total conflict ( $C_{total}$ ) over an interaction is decomposed into contributions from agents and time.

Responsibility ( $R$ ): Quantifies the share of an agent (or time) in diminishing the conflict.
$R_x = \frac{1}{C_{total}} \int CC^-_x(t) dt$
Where $CC^-$ is the negative contribution (reduction) of conflict. A high $R$ indicates the agent actively resolved the conflict (e.g., by slowing down or evading).
- Note: $R_{Agent1} + R_{Agent2} + R_{Time} = 1$ .
Engagement ( $E$ ): Quantifies the share of an agent in escalating or intensifying the conflict.
$E_x = \frac{1}{C_{total}} \int CC^+_x(t) dt$
Where $CC^+$ is the positive contribution (increase) of conflict. A high $E$ indicates an agent made the situation worse (e.g., by turning toward another agent or speeding up).
- Note: $E_{Agent1} + E_{Agent2} + E_{Time} = 1$ .

Time Contribution: The metric also accounts for "Time" as an agent. If agents do nothing and the conflict resolves simply because they pass each other (or collide), the "Time" variable absorbs the Responsibility or Engagement.

3. Key Contributions

Conflict Buildup Modeling: Extends the Responsibility metric to include the phase where a conflict is building up, using time normalization relative to the TCE.
Introduction of Engagement: Proposes the $E$ metric to capture active conflict escalation, distinguishing between passive avoidance and active aggravation.
Normative Grounding: Demonstrates how these metrics can be used to evaluate planning algorithms (e.g., Social Force vs. DWA vs. Ballistic) by quantifying their foresightedness and social compliance.
Proxemics Adaptation: Shows that the metrics can be tuned to evaluate "Personal Space" compliance (by increasing agent radii in the calculation) rather than just physical collision avoidance.

4. Experimental Results

The authors validated the metrics through five simulation experiments:

Experiment 1 (Dyadic Interactions): Tested frontal, crossing, and overtaking scenarios.
- Result: The metrics correctly assigned 100% Responsibility to the agent that evaded and 0% to the oblivious agent. In symmetric interactions (both agents responsive), Responsibility was split (e.g., 50/50).
- Insight: $R$ captures both longitudinal (speed) and lateral (steering) changes.
Experiment 2 (Group Splitting): A robot navigating through a pair of humans (Alice and Bob).
- Result: The robot's strategy (passing left, right, or splitting the group) resulted in distinct $R$ and $E$ profiles. The "best" strategy (avoiding the group entirely) yielded the highest mean Responsibility and lowest Engagement. The "worst" strategy (forcing through the middle) yielded low Responsibility and high Engagement.
Experiment 3 (Crowd Navigation): A robot navigating a dense crowd (20 agents).
- Result:
  - Ballistic Robot: $R \approx 0\%$ , $E \approx 0\%$ (Humans took all Responsibility).
  - Short-sighted DWA: Slight increase in $R$ (~3%), but humans still did most of the work.
  - Social Force (Foresighted): Significantly higher $R$ (~~33%) and higher $E$ (~~23%), indicating the robot actively maneuvered to resolve conflicts, though this maneuvering also intensified some interactions.
Experiment 4 (Playing Catch): A chase scenario.
- Result: The chaser (Alice) had high Engagement (90%) because her pursuit intensified the conflict. The robot (runner) had high Responsibility (63%) for evading.
Experiment 5 (Personal Space): Re-evaluating overtaking with larger agent radii (1m).
- Result: Even if a collision was avoided, violating personal space resulted in lower Responsibility scores and non-zero "Time" Responsibility (residual conflict), proving the metric's sensitivity to proxemics.

5. Significance and Limitations

Significance:

Algorithm Evaluation: Provides a quantitative way to compare planning algorithms. Foresighted algorithms consistently show higher Responsibility scores.
Behavioral Analysis: Allows researchers to distinguish between "good" behavior (proactive resolution) and "bad" behavior (forcing others to react), even in complex group dynamics.
Design Tool: Can be integrated into loss functions for learning-based navigation systems to optimize for social compliance.

Limitations & Future Work:

Dyadic Nature: The metrics are inherently pairwise. While aggregated statistics work for crowds, second-order effects (e.g., avoiding A causes a conflict with B) are only indirectly captured.
Reference Behavior Assumption: The metric assumes the "reference" behavior is constant velocity. This may be less accurate in environments with complex constraints (e.g., curvy paths) where the "natural" path isn't straight.
Normative Interpretation: The metrics are descriptive, not prescriptive. A value of $R=0$ is not inherently "bad" (e.g., if approached from behind). Context is required to interpret the values normatively.
Grounding: The authors note the need to ground these metrics in real-world human trajectory data to establish baseline expectations for human-human interactions.

In conclusion, the paper provides a robust mathematical framework for evaluating the causal role of agents in social navigation, moving beyond simple collision avoidance to assess the quality of social interaction and foresight.