Quality over Quantity: Demonstration Curation via Influence Functions for Data-Centric Robot Learning

Here is an explanation of the paper "Quality over Quantity (QoQ)" using simple language and creative analogies.

The Big Problem: Too Much "Bad" Practice

Imagine you are trying to teach a robot how to make a perfect cup of coffee. You give it 1,000 videos of humans making coffee.

The Reality: Some videos show a master barista making a perfect latte. Others show a clumsy person spilling milk, burning the beans, or dropping the cup.
The Old Way: Most robot learning methods say, "Just watch all 1,000 videos and try to copy the average."
The Result: The robot learns to be average. It ends up spilling milk half the time because it was confused by all the bad examples mixed in with the good ones.

In the past, humans had to manually watch these videos and delete the bad ones. This is slow, expensive, and boring.

The Solution: "Quality over Quantity" (QoQ)

The authors of this paper propose a new way to teach robots. Instead of asking, "Is this video perfect?" they ask a smarter question: "If I remove this specific video from the robot's training, does the robot get worse?"

If removing a video makes the robot worse, that video is Gold. If removing it doesn't change anything (or makes the robot better), that video is Trash.

They call this system QoQ. It uses a mathematical tool called Influence Functions to act like a "super-teacher" that instantly knows which lessons matter most.

How It Works: The Two Magic Tricks

The researchers found that simply using the math tool wasn't enough; it was too noisy. So, they added two clever tricks to make it work for robots.

Trick 1: The "Spotlight" (Maximum Influence)

The Analogy: Imagine you are studying for a math test. You have a practice test (Validation Data) and a textbook (Training Data).

The Old Way: You look at every single question on the practice test and average how much each chapter in the textbook helps you. If Chapter 5 helps with one hard question but confuses you on 10 easy ones, the average might say "Chapter 5 is okay."
The QoQ Way: The "Spotlight" looks at the hardest question on the practice test and asks, "Which chapter in the textbook is the absolute best at solving this specific problem?" It ignores the easy stuff and focuses only on the most relevant match.

Why it helps: Robots do many different things (grasping, moving, lifting). A video might be terrible at "lifting" but amazing at "grasping." The Spotlight ensures the robot keeps the "grasping" video because it's the best example for that specific move, even if the rest of the video is messy.

Trick 2: The "Whole Story" (Trajectory Curation)

The Analogy: Imagine you are editing a movie.

The Old Way: You look at the movie frame-by-frame. You find 50 perfect frames of a hero jumping and 50 perfect frames of a hero landing. You cut out all the boring walking scenes in between.
The Problem: Now you have a movie where the hero teleports from the ground to the sky. It makes no sense!
The QoQ Way: QoQ says, "Don't just pick the best frames; pick the best whole scenes." If a video clip (trajectory) has a high score, you keep the entire clip, including the walking, the jumping, and the landing.

Why it helps: Robots need to see the full sequence of actions to understand how to move smoothly. By keeping whole videos, the robot learns the flow of movement, not just isolated snapshots.

The Results: From Clumsy to Pro

The team tested this on both computer simulations and real robots (like a robotic arm picking up bananas or opening cabinets).

The Test: They took a messy dataset full of failed attempts and used QoQ to filter out the trash.
The Outcome:
- In simulations, robots trained on QoQ-filtered data succeeded 99% of the time, compared to about 76% for older methods.
- In the real world, the success rate jumped from 56% to 86%.
- They even tested it on a massive, messy dataset collected "in the wild" (DROID), and QoQ still managed to find the good lessons hidden in the noise.

The Bottom Line

This paper teaches us that more data isn't always better; better data is.

Think of it like a diet. Eating 10,000 calories of junk food won't make you strong. But eating 1,000 calories of high-quality, nutrient-dense food will. QoQ is the nutritionist for robots, helping them filter out the junk food (bad demonstrations) and feast on the nutrient-dense lessons (high-quality trajectories) so they can learn faster and perform better.

Here is a detailed technical summary of the paper "Quality over Quantity: Demonstration Curation via Influence Functions for Data-Centric Robot Learning."

1. Problem Statement

Learning from demonstrations (LfD) is a dominant paradigm for end-to-end robot control, often relying on large datasets collected via human teleoperation. However, these datasets suffer from significant quality issues:

Noise and Suboptimality: Human errors, operational constraints, and varying skill levels introduce noisy and suboptimal behaviors.
Ineffective Curation: Current data curation methods are largely manual, expensive, and rely on heuristic proxy metrics (e.g., similarity to expert data or mutual information). These proxies often fail to capture which specific training samples actually contribute to improving the final policy's performance on unseen tasks.
Redundancy and Coverage: Naive application of existing data valuation techniques to robot data often results in selecting redundant state-action pairs, leading to poor coverage of the state space.

The core challenge is to develop a systematic, automated method to identify and select high-quality demonstration data that directly correlates with improved policy generalization.

2. Methodology: Quality over Quantity (QoQ)

The authors propose QoQ, a data curation framework that defines data quality based on the direct contribution of a training sample to reducing the loss on a validation set representing desired behavior. The method leverages Influence Functions to estimate this contribution without retraining the model.

The methodology consists of two primary innovations to adapt influence functions for robotic trajectories:

A. Definition of Data Quality

Instead of using static feature similarity, QoQ defines quality as the ability of a training sample $(s, a)$ to reduce the loss on a small set of high-quality validation demonstrations ( $D_{val}$ ). This is estimated using influence functions, which approximate how the model parameters (and thus validation loss) change if a specific training point is up-weighted.

B. Key Technical Components

To make influence functions practical and effective for robot learning, QoQ introduces two critical techniques:

Maximum Influence Scoring (Step 1):
- Problem: Standard influence functions average the gradient product over all validation samples. In robotics, a validation trajectory contains diverse behaviors; averaging can dilute the signal if a specific training sample is only relevant to a specific sub-task within the validation set.
- Solution: QoQ calculates the influence score of a training state-action pair $(s, a)$ by taking the maximum dot product of its normalized gradient with the gradients of all state-action pairs in the validation set:
  $QoQ\text{-}score(s, a) = \max_{(s', a') \in D_{val}} g(s', a')^\top g(s, a)$
- Benefit: This focuses on the most relevant validation behavior, reducing noise and ensuring the selected data is highly impactful for specific task requirements.
Trajectory-wise Curation (Step 2):
- Problem: Selecting individual high-scoring state-action pairs often leads to the selection of redundant segments (e.g., only grasping moments) while filtering out necessary context (e.g., reaching motions), resulting in poor state coverage.
- Solution: QoQ aggregates the influence scores of all state-action pairs within a single trajectory $\tau$ (using the mean) and selects the top $N$ entire trajectories based on these aggregated scores.
- Benefit: This ensures the curated dataset maintains diverse state distributions and captures complete, coherent behavior sequences.

C. Computational Efficiency

To handle large-scale foundation models (e.g., Vision-Language-Action models with billions of parameters), QoQ employs:

Layer Selection: Computing gradients only on specific network layers (e.g., action heads) rather than the entire network.
Gradient Compression: Using the One-Permutation One-Random-Projection (OPORP) technique to compress gradient vectors while preserving dot-product relationships, significantly reducing storage and computation costs.

3. Key Contributions

Performance-Based Quality Definition: Shifts the paradigm from heuristic feature similarity to a grounded definition of quality based on the direct impact on policy performance (validation loss reduction).
Adapted Influence Functions: Introduces Maximum Influence Scoring and Trajectory-wise Curation to overcome the noise and redundancy issues inherent in applying standard influence functions to sequential robot data.
Scalability: Demonstrates that influence-based curation is feasible for modern, large-scale robot foundation models through efficient gradient approximation and layer selection.
Validation Set Flexibility: Shows that QoQ can utilize policy rollouts (including failed trajectories) as a validation set, allowing for iterative improvement without requiring a pre-existing "perfect" dataset.

4. Experimental Results

The authors evaluated QoQ on both simulated (Robomimic) and real-world robot tasks (Franka Research 3 arm).

Simulation (Robomimic - Can Pick-and-Place):
- QoQ achieved a 99.2% success rate, significantly outperforming the best baseline (Flow Retrieval at 76.0%) and the "All Data" baseline (55.4%).
- Curation accuracy (proportion of successful trajectories selected) reached 99.4%.
Real Robot (Banana Grasping & Multi-Object):
- In the banana grasping task, QoQ achieved an 86.7% success rate compared to 56.7% for the best baseline.
- In a multi-object curation task (selecting helpful data for bananas from a mix of objects), QoQ achieved 93.3% success, whereas the baseline (Behavior Retrieval) failed completely (20%) due to distraction by irrelevant object features.
In-the-Wild Data (DROID Dataset):
- QoQ maintained high curation accuracy (78.2%) on the diverse DROID dataset, outperforming baselines that struggled with heterogeneous visual inputs and behaviors.
Ablation Studies:
- Removing Maximum Influence Scoring or Trajectory-wise Curation resulted in significant drops in both curation accuracy and policy success rates, confirming the necessity of both components.
- Computing gradients on only a subset of layers (e.g., Action Head) yielded results consistent with full-parameter computation, validating the efficiency strategy.

5. Significance

This work addresses a critical bottleneck in data-centric robot learning: the reliance on low-quality, noisy human demonstrations. By providing a systematic, mathematically grounded method to curate data based on its actual utility to the policy, QoQ enables:

Efficient Learning: Achieving higher performance with fewer, higher-quality training samples.
Robustness: Better generalization in real-world, diverse environments by filtering out failure modes and irrelevant behaviors.
Scalability: Making advanced data curation feasible for the next generation of large-scale robot foundation models.

The paper establishes that "Quality over Quantity" is not just a slogan but a measurable, actionable strategy that significantly outperforms current state-of-the-art data selection methods in both simulation and real-world deployment.