Dance2Hesitate: A Multi-Modal Dataset of Dancer-Taught Hesitancy for Understandable Robot Motion

Imagine you are playing a game of Jenga with a robot. You reach out to pull a block, and the robot reaches out too. If the robot moves smoothly and confidently, you might feel safe. But what if the robot stops, wobbles, or moves very slowly? You might think, "Wait, is it unsure? Is it about to drop the block? Should I grab it before it falls?"

That "wobble" or "pause" is called hesitancy. In the world of human-robot teamwork, this hesitation is actually a superpower. It's a non-verbal way for the robot to say, "I'm not 100% sure I can do this safely," which helps humans react quickly to stay safe.

However, teaching a robot to hesitate just right is incredibly hard. It's like trying to teach a fish to dance; a fish doesn't have legs, and a robot arm doesn't have a whole body to express feelings. If a robot arm hesitates, it might look like a glitch. If a human hesitates, it looks like caution.

The Big Idea: "Dance2Hesitate"

To solve this, the researchers created a new dataset called Dance2Hesitate. Think of this dataset as a giant library of "hesitation recipes."

Instead of trying to program a robot to hesitate using math equations (which often looks robotic and weird), they asked professional dancers to teach them how to do it.

Here is how they did it, using some simple analogies:

1. The "Human Translator" (The Dancers)

Dancers are experts at using their bodies to tell stories without saying a word. They know exactly how to make a movement look "uncertain" or "cautious" without actually stopping.

The Experiment: The researchers set up a specific scene: A robot arm (or a human arm) needs to reach for a Jenga tower.
The Levels: They asked the dancers to perform this reach in three different "flavors" of hesitation:
- Slight: Like a gentle "hmm, let me think."
- Significant: Like a clear "whoa, I'm not sure about this."
- Extreme: Like a dramatic "STOP! This is dangerous!"
The Magic: Because the dancers are so skilled, they could repeat these movements perfectly, giving the researchers clean, high-quality data to study.

2. The "Two-Way Mirror" (The Data)

The researchers recorded this in two ways to create a "Rosetta Stone" for robots:

The Robot Side: The dancers physically guided the robot arm (like holding a friend's hand to show them how to move) to reach the Jenga tower. This taught the robot exactly how a human wants it to move.
The Human Side: They filmed the dancers with special 3D cameras as they performed the same moves with their own arms and bodies.

This creates a bridge. Now, if a robot wants to hesitate, it can look at the dancer's movement, copy the "vibe," and translate it into its own mechanical language.

3. Why This Matters

Imagine a self-driving car approaching a pedestrian.

Without Hesitancy: The car stops abruptly. The pedestrian thinks, "Did the car break? Is it going to hit me?"
With Hesitancy: The car slows down, wobbles slightly, and pauses. The pedestrian understands, "Ah, the car sees me and is being careful. I can cross."

This dataset helps engineers build robots that don't just do tasks, but communicate their internal state. It turns a cold, mechanical machine into a partner that you can "read" like a human.

The Takeaway

The paper is essentially saying: "We asked dancers to teach robots how to be unsure, so that robots can stop looking like glitchy machines and start looking like thoughtful teammates."

They have now made all this data (videos, robot movement logs, and 3D motion files) free for anyone to use. This means researchers everywhere can now build robots that hesitate in a way that humans naturally understand, making our future interactions with robots safer and more intuitive.

Here is a detailed technical summary of the paper "Dance2Hesitate: A Multi-Modal Dataset of Dancer-Taught Hesitancy for Understandable Robot Motion."

1. Problem Statement

In Human-Robot Interaction (HRI), a robot's ability to express hesitancy is crucial for human coordination, attention allocation, and safety judgments. When a robot is uncertain about task success, conveying this through motion helps humans calibrate trust and intervene appropriately. However, designing generalizable hesitant robot motion is difficult due to two primary challenges:

Embodiment Dependency: Nonverbal cues are interpreted differently based on the robot's form factor (e.g., a humanoid vs. a robotic arm). A motion that signals caution on a human-like robot may be ambiguous or misinterpreted on a manipulator.
Context Specificity: The perception of hesitancy depends heavily on the task and environment. A pause might signal caution in one context but malfunction in another.

Existing datasets often lack the control necessary to isolate "hesitancy" as a specific functional expressivity while holding the task goal constant across different modalities (human vs. robot).

2. Methodology

The authors introduced Dance2Hesitate, a multi-modal dataset generated by recruiting professional dancers to perform controlled demonstrations. The study focuses on two specific context-embodiment pairs to isolate hesitancy cues:

Robotic Manipulation: A Franka Emika Panda arm reaching from a fixed start configuration to a fixed target (a Jenga tower).
Anthropomorphic Motion: Human dancers performing the same reaching behavior using their upper limbs and full bodies in free space.

Data Collection Protocol:

Participants: 14 dancers (10 with 10+ years of training, 4 with 1–10 years) from various genres (Hip-Hop, Ballet, Contemporary, Jazz).
Hesitancy Levels: Three graded levels were defined: Slight, Significant, and Extreme.
Robot Data (Kinesthetic Teaching): Dancers physically guided the Franka Panda (in gravity-compensation mode) to the Jenga tower. The robot logged full state data ( $q, \dot{q}, \tau, p_{ee}, R_{ee}$ ).
Human Data (Motion Capture):
- Upper-Limb: Dancers performed reaching motions at all three hesitancy levels.
- Full-Body: Dancers performed sequences specifically emphasizing Extreme hesitancy to capture whole-body weight shifts and posture.
- Sensors: Synchronized RGB-D data was captured using two Intel RealSense cameras.

3. Key Contributions

The paper makes three primary technical contributions:

A Multi-Modal, Dancer-Generated Dataset:
- 70 unique whole-body trajectories (Extreme hesitancy).
- 84 upper-limb trajectories (covering Slight, Significant, and Extreme levels).
- 66 kinesthetic teaching trajectories on the Franka Panda (covering all three levels).
- The use of dancers ensures high repeatability and intentional modulation of hesitancy, reducing unwanted variability compared to non-expert participants.
Unified Data Release & Standardization:
The dataset is released with standardized formats to enable reproducible benchmarking across modalities:
- Robot Data: ROS bags, 5 CSV files (Joint angles, velocities, efforts, end-effector position/orientation), and NPZ files.
- Human Data: MP4 videos, 2D keypoints (OpenPose BODY_25), 3D keypoints (back-projected from depth), and synchronized depth images (16-bit PNG).
- Synchronization: All data streams are time-aligned with confidence scores for keypoint detection.
Cross-Modal Benchmarking Framework:
By fixing the functional goal (reaching the Jenga tower) and varying only the hesitancy, the dataset allows researchers to extract kinematic signatures of hesitancy that are comparable between humans and robots, facilitating the transfer of expressive behaviors.

4. Results and Data Statistics

Data Volume: The collection resulted in a comprehensive set of 220+ trajectories across modalities.
Data Quality: The use of professional dancers resulted in clear separations between hesitancy levels, making the data suitable for training supervised models.
Processing: The authors implemented a confidence-aware processing pipeline. 2D keypoints are extracted via OpenPose (threshold $\ge 0.30$ ), and 3D keypoints are generated by back-projecting 2D points using camera intrinsics and depth maps. Missing data points are not hallucinated but left blank to ensure data integrity.

5. Significance and Future Impact

The Dance2Hesitate dataset addresses a critical gap in HRI research by providing a controlled, reproducible resource for studying functional expressivity. Its significance lies in:

Robot Learning: Enabling the development of controllers that can generate "hesitancy sliders" (continuous interpolation between levels) or recognize hesitancy in robot states for safety-critical applications.
Cross-Embodiment Modeling: Providing the foundation for learning shared latent embeddings of hesitancy that align human and robot trajectories without explicit retargeting.
Human-Centric Design: Allowing researchers to analyze which kinematic features (e.g., pauses, delayed commitment, weight shifts) most strongly correlate with perceived hesitancy.

The authors propose future work extending this to other performance practitioners (e.g., puppeteers, actors) to distinguish universal hesitancy signatures from domain-specific embellishments, ultimately improving the transparency and safety of collaborative robots.

Access: The dataset is open-source and available at: https://brsrikrishna.github.io/Dance2Hesitate/.

Dance2Hesitate: A Multi-Modal Dataset of Dancer-Taught Hesitancy for Understandable Robot Motion

The Big Idea: "Dance2Hesitate"

1. The "Human Translator" (The Dancers)

2. The "Two-Way Mirror" (The Data)

3. Why This Matters

The Takeaway

1. Problem Statement

2. Methodology

3. Key Contributions

4. Results and Data Statistics

5. Significance and Future Impact

More like this

Monotone Comparative Statics without Lattices

Motion Illusions Generated Using Predictive Neural Networks Also Fool Humans

Performance Analysis of IEEE 802.11p Preamble Insertion in C-V2X Sidelink Signals for Co-Channel Coexistence

Construction of time-varying ISS-Lyapunov Functions for Impulsive Systems

Real-Time BDI Agents: a model and its implementation