PeRoI: A Pedestrian-Robot Interaction Dataset for Learning Avoidance, Neutrality, and Attraction Behaviors in Social Navigation

Imagine you are walking down a busy sidewalk. Suddenly, a robot rolls into your path. What do you do?

Most people assume everyone would just step aside to avoid a collision. But in reality, human reactions are much more colorful. Some people might dodge the robot like it's a hot potato (Avoidance). Others might walk right past it without even glancing, as if it's invisible (Neutrality). And then, there are the curious ones who might actually slow down or turn toward the robot to get a better look (Attraction).

For a long time, robots trying to navigate our world have been taught to expect only the first reaction: "Everyone will run away." This paper, titled PeRoI, argues that this assumption is wrong and limits how well robots can interact with us.

Here is the breakdown of their solution, explained simply:

1. The Problem: Robots are "Blind" to Human Nuance

Think of current robot navigation systems like a driver who only knows how to drive in a straight line. They have maps of where people usually walk, but they don't understand why people move the way they do when a robot is around.

Existing data sets are like old photo albums: they show people walking, but they rarely show what happens when a robot enters the room. If a robot is trained only on data where people always avoid it, it will be confused and clumsy when someone actually stops to say "Hello" or just walks past it casually.

2. The Solution: The "PeRoI" Dataset (The Robot's Diary)

The researchers created a new database called PeRoI (Pedestrian-Robot Interaction). Imagine they set up a camera in two busy outdoor spots (like a university campus) and watched thousands of people walk by.

They introduced three different scenarios:

The Ghost: No robot is there (just people walking).
The Statue: A robot stands still in the middle of the path.
The Walker: A robot moves along a specific path.

They used three different types of robots to see if the look of the robot mattered:

A wheeled robot that looks like a friendly office assistant.
A four-legged robot that looks like a dog.
A boxy, industrial robot that looks like a delivery truck.

The Big Discovery: They found that people react differently based on the robot's shape and movement.

The "dog" robot (Unitree Go1) made people curious (Attraction).
The "truck" robot (MPO700) made people nervous and keep their distance (Avoidance).
Many people just walked by without caring (Neutrality).

They labeled every single person's reaction in the database, creating a "dictionary" of how humans actually behave, not just how we think they behave.

3. The New Brain: NeuRoSFM (The Robot's New Instincts)

Having the data is great, but the robot needs a way to use it. The authors built a new model called NeuRoSFM.

Think of the old way of programming robots as a recipe book. The recipe says: "If a person is 2 meters away, push them away with Force X." It's rigid and requires a human expert to tweak the numbers constantly.

The new NeuRoSFM is more like a muscle memory learned from experience.

Instead of hard-coded rules, the robot uses a "neural network" (a type of AI brain) to learn the forces.
It learns that a "dog robot" might pull people in, while a "truck robot" pushes them away.
It also learns about groups. If you are walking with friends, you might not move away from the robot even if you are scared, because you are sticking with your group. The old models ignored this; the new one accounts for it.

4. The Result: Smoother Dancing

The researchers tested this new "brain" on real-world data.

Old Model: Predicted that everyone would run away from the robot. It was wrong about 30% of the time.
New Model (NeuRoSFM): Predicted that some would run, some would ignore, and some would get curious. It was much more accurate.

The Takeaway

This paper is like teaching a robot to understand social etiquette.

Before, robots were like toddlers in a crowd: they knew they had to move, but they didn't understand that sometimes people want to say hi, sometimes they want to ignore you, and sometimes they just want to keep walking.

By collecting the PeRoI dataset and building the NeuRoSFM model, the authors have given robots the ability to "read the room." This means future robots won't just be safe; they will be polite, predictable, and comfortable to be around in our shopping malls, hospitals, and sidewalks.

Here is a detailed technical summary of the paper "PeRoI: A Pedestrian-Robot Interaction Dataset for Learning Avoidance, Neutrality, and Attraction Behaviors in Social Navigation."

1. Problem Statement

As robots are increasingly deployed in public spaces (malls, sidewalks, hospitals), social navigation has become critical. Robots must not only avoid collisions but also adhere to social norms. However, current trajectory prediction models and datasets suffer from significant limitations:

Lack of Diversity in Interaction: Existing datasets (e.g., ETH, UCY) focus on human-human interactions, while robot-centric datasets (e.g., JRDB, SCAN-D) often lack explicit annotations of how pedestrians react to robots.
Oversimplified Assumptions: Most models assume pedestrians uniformly avoid robots. In reality, human responses are a spectrum including avoidance (repulsion), neutrality (ignoring), and attraction (curiosity/approach).
Model Limitations: Classical models like the Social Force Model (SFM) treat robot interactions as purely repulsive forces. Modern deep learning models often ignore robots or fail to capture the nuance of these diverse behavioral responses, leading to poor generalization in real-world social settings.

2. Methodology

The authors address these gaps through two primary contributions: a new dataset and a novel predictive model.

A. The PeRoI Dataset (Pedestrian-Robot Interaction)

The authors collected a large-scale, real-world dataset designed to capture the full spectrum of pedestrian responses to robots.

Data Collection:
- Environments: Two outdoor settings (an office pathway and a university plaza) recorded via overhead RGB cameras at 15 Hz.
- Conditions: Three distinct scenarios were recorded:
  1. PD: Pedestrians only (Baseline).
  2. PD–SR: Pedestrians + Stationary Robot (Three morphologies: Toyota HSR, Unitree Go1, Neobotix MPO700).
  3. PD–MR: Pedestrians + Moving Robot (Unitree Go1 teleoperated along a path).
- Scale: 18,669 total trajectories (142 hours), with 16.45% involving robot interactions.
Annotation Strategy:
- Unlike previous datasets, every trajectory involving a robot is manually labeled into one of three behavioral categories:
  - Avoidance: Deviation from the nominal path to maintain distance.
  - Neutrality: Negligible change in path or speed.
  - Attraction: Approaching or orienting toward the robot.
Privacy: Data was collected via ethnographic observation without identifiable images; no formal recruitment was used to minimize bias.

B. The NeuRoSFM Model (Neural Robot Social Force Model)

To leverage the dataset, the authors propose NeuRoSFM, an extension of the classical Social Force Model (SFM) that integrates neural networks.

Core Concept: Instead of using fixed mathematical formulas for forces, NeuRoSFM uses Multi-Layer Perceptrons (MLPs) to learn force components directly from data.
Force Components:
1. Goal Attraction ( $\vec{f}_a$ ): Drives pedestrians toward their destination.
2. Obstacle Repulsion ( $\vec{f}_o$ ): Avoids static obstacles.
3. Pedestrian Repulsion ( $\vec{f}_p$ ): Avoids other humans (incorporating anisotropy of human vision).
4. Robot Force ( $\vec{f}_r$ ): A learned component that models robot-induced forces. Crucially, it handles repulsion (avoidance), neutrality (treated as an obstacle with no extra force), and attraction (modeled as a temporary goal switch).
5. Group Cohesion ( $\vec{f}_{gr}$ ): Maintains proximity to social groups.
Architecture: Five separate neural networks compute these forces, which are then summed to predict the pedestrian's acceleration and trajectory.

3. Key Contributions

PeRoI Dataset: The first large-scale dataset to explicitly annotate pedestrian responses to robots across three categories (Avoidance, Neutrality, Attraction) and three robot states (None, Stationary, Moving).
NeuRoSFM: A physics-informed, learning-based model that successfully integrates robot-specific forces and group dynamics, moving beyond the "purely repulsive" assumption of traditional SFM.
Behavioral Insights: Empirical analysis showing that robot morphology and motion state significantly influence behavior (e.g., the quadruped Go1 elicited the highest attraction, while the industrial MPO700 elicited the most avoidance).

4. Experimental Results

Dataset Analysis

Comparison: PeRoI contains a significantly higher percentage of robot-influenced trajectories (16.45%) compared to benchmarks like ETH (0%) and JRDB (1.57%).
Dynamics: PeRoI exhibits a more uniform velocity distribution centered around natural walking speeds (~1.5 m/s), whereas other datasets show skewed distributions with many low-speed/stopped instances.
Behavioral Variance:
- Attraction: Highest for the moving Go1 (7.96%) and stationary Go1 (7.82%).
- Avoidance: Highest for the stationary MPO700 (33.95%).
- Distance: Moving robots generally induce stronger repulsion (larger separation distances) than stationary ones.

Model Performance

Trajectory Prediction (ADE/FDE):
- NeuRoSFM achieved the lowest Average Displacement Error (ADE) across all tested datasets (ETH, JRDB, PeRoI) compared to classical SFM and the optimized SRFM.
- Ablation Studies: Removing the robot force or group force degraded performance, confirming that both explicit robot modeling and group dynamics are essential for accuracy.
Generalization: Training standard deep learning models (like DDL) with PeRoI data improved their performance on standard benchmarks, proving the dataset's utility for learning robust predictive dynamics.

5. Significance and Impact

Advancing Social Navigation: This work bridges the gap between theoretical models and real-world social complexity. By acknowledging that robots can be "attractive" or "neutral" rather than just "obstacles," robots can navigate more naturally and acceptably.
Data-Driven Modeling: The shift from hand-tuned parameters (in classical SFM) to learned neural components allows models to adapt to specific robot morphologies and environmental contexts automatically.
Future Applications: The dataset and model provide a foundation for safer, more socially compliant autonomous systems in hospitals, malls, and public transit hubs. The authors plan to expand the dataset to indoor environments and incorporate 3D sensors (LiDAR/Depth) for richer interaction modeling.

In summary, PeRoI and NeuRoSFM represent a paradigm shift in human-robot interaction research, moving from simplified avoidance models to nuanced, behavior-aware navigation systems capable of handling the full spectrum of human curiosity and caution.