IROSA: Interactive Robot Skill Adaptation using Natural Language

Imagine you have a highly skilled robot chef. You taught it how to make a perfect sandwich by physically guiding its arm through the motions once or twice. Now, the robot knows the recipe, but it's a bit rigid. It doesn't know how to "slow down" if the bread is fragile, or how to "dodge" a spilled jar of mustard that suddenly appeared on the counter.

Traditionally, to fix this, you'd have to be a computer programmer, write new code, and retrain the robot. That's slow and expensive.

This paper introduces IROSA (Interactive Robot Skill Adaptation using Natural Language), which is like giving that robot chef a super-smart, bilingual assistant who speaks both "Robot" and "Human."

Here is how it works, broken down with simple analogies:

1. The Problem: The "Black Box" vs. The "Toolbox"

Most modern AI robots try to learn everything from scratch, like a student trying to memorize an entire library of books just to answer one question. If you ask them to "move faster," they might guess wrong, crash, or do something unpredictable. This is dangerous in a factory.

IROSA's Solution: Instead of letting the AI guess, the authors built a strict toolbox.

The AI (The Manager): This is a Large Language Model (like the brain of a very smart assistant). It listens to you.
The Tools (The Workers): The robot doesn't have a direct line to its muscles. Instead, the AI can only ask for specific, pre-approved tools.
- Tool A: "Speed Up/Slow Down."
- Tool B: "Add a stop here."
- Tool C: "Dodge that object."

The AI is like a project manager who can talk to you in English, but it can only give orders to the workers by handing them specific, pre-written instruction cards. It cannot just yell "Do whatever you want!" This keeps the robot safe and predictable.

2. The Magic Trick: "Zero-Shot" Adaptation

Usually, if you want a robot to do something new, you have to feed it thousands of examples to "retrain" it. That's like hiring a new chef and making them practice for a month.

With IROSA, you don't need to retrain.

The Analogy: Imagine the robot has already learned the "dance steps" for the sandwich. The AI doesn't change the dance steps; it just changes the music tempo (speed) or tells the dancer to step around a chair (obstacle avoidance).
Because the robot uses a mathematical method called KMPs (Kernelized Movement Primitives), it understands the "shape" of the movement. The AI simply adds a "via-point" (a temporary stop) or stretches the time between steps. It's like editing a video clip: you don't need to re-film the whole movie; you just cut a few seconds or add a transition.

3. How It Works in Real Life (The Three Scenarios)

The researchers tested this on a real robot arm doing an industrial task (putting a metal ring into a hole).

Scenario A: "Slow down!"
- You say: "Slow down by 50% before reaching the box."
- What happens: The AI picks the "Speed Modulation" tool. It tells the robot, "Hey, between the moment you pick up the ring and the moment you get to the box, stretch out the time." The robot slows down perfectly without crashing.
Scenario B: "Check the camera!"
- You say: "Check the ring with the camera on the left."
- What happens: The AI picks the "Via-Point Insertion" tool. It calculates where the camera is and tells the robot, "Add a tiny stop at the camera's location before you go to the box." The robot smoothly swings over to look, then continues its job.
Scenario C: "Avoid the blue box!"
- You say: "Please avoid the blue box."
- What happens: The AI picks the "Repulsion Point" tool. It sees the blue box is in the way. It tells the robot, "Imagine a force field pushing you away from that box." The robot automatically curves its path around the box, like a car steering around a pothole.

4. Why Is This Better Than Other Methods?

Other methods try to write code on the fly (like asking the AI to write a Python script to move the robot).

The Risk: If the AI writes a typo in the code, the robot might crash. It's like asking a student to write a legal contract; if they make a mistake, the whole thing fails.
IROSA's Advantage: By using pre-defined tools, the AI can't make up crazy new code. It can only use the tools that have already been tested and proven safe. It's like giving a child a set of LEGO bricks that only fit together correctly, rather than letting them try to glue random objects together.

The Bottom Line

This paper presents a way to talk to industrial robots using normal English, without needing to be a programmer. It acts as a safe translator between your human desires ("Go faster," "Don't hit that") and the robot's rigid math.

It's the difference between trying to teach a dog complex calculus (hard and dangerous) versus giving it a set of clear, simple commands it already knows how to execute (safe and effective). This makes it possible for regular factory workers to adjust robots on the fly, making factories more flexible and safer.

Here is a detailed technical summary of the paper "IROSA: Interactive Robot Skill Adaptation using Natural Language":

1. Problem Statement

Industrial robotics increasingly requires flexible systems that can be reconfigured for varying tasks without expert reprogramming. While Large Language Models (LLMs) offer intuitive control via natural language, their direct application to robot control faces significant hurdles:

Safety & Reliability: End-to-end models often lack interpretability and safety guarantees required in industrial settings.
Data Efficiency: Many approaches require extensive training data or fine-tuning for specific tasks.
Real-time Adaptation: Existing modular approaches often rely on offline training, simulation loops, or iterative code generation, which prevents real-time, zero-shot adaptation.
The Gap: There is a need for a framework that combines the semantic understanding of LLMs with the safety and determinism of established control theories, allowing for immediate skill adaptation without retraining.

2. Methodology: The IROSA Framework

The authors propose IROSA (Interactive Robot Skill Adaptation), a tool-based architecture that strictly separates language understanding from robot control.

Core Architecture

Tool-Based Abstraction: Instead of allowing the LLM to generate raw robot actions or code, the LLM is provided with a predefined "toolbox" of validated functions. The LLM's role is limited to selecting the appropriate tool and parameterizing it based on natural language input.
Underlying Control Model (KMPs): The framework utilizes Kernelized Movement Primitives (KMPs), a non-parametric probabilistic imitation learning method. KMPs learn skills from a small number of demonstrations (2–5) and represent trajectories as Gaussian distributions. They allow for principled trajectory modification through the addition of constraints (via-points) and temporal scaling.
Workflow:
1. User Query: The user provides a natural language command (e.g., "slow down before the box").
2. Tool Selection: The LLM analyzes the command and environment context to select a specific tool from the JSON-schema-defined toolbox.
3. Parameterization: The LLM extracts parameters (e.g., speed percentage, target coordinates) based on the tool's description and environment observations.
4. Validation & Execution: The system validates parameters for safety (type, range, workspace bounds). If valid, the tool executes, modifying the KMP's internal trajectory representation.
5. Feedback: The robot executes the adapted skill, and the cycle repeats for further refinement.

Key Adaptation Tools

The framework implements three primary adaptation primitives:

Speed Modulation: Adjusts execution speed for specific trajectory segments by scaling time intervals ( $\delta t$ ) based on a percentage factor, preserving spatial characteristics.
Via-point Insertion: Adds spatial constraints to steer the trajectory through specific regions (e.g., "approach from above") by inserting new mean/covariance triplets into the KMP distribution.
Repulsion Point Generation: Enables obstacle avoidance by defining a Signed Distance Field (SDF) around obstacles. If the predicted trajectory violates a safety margin, the system inserts via-points to push the trajectory away from the obstacle.

3. Key Contributions

Tool-Based Architecture for Zero-Shot Adaptation: A novel framework enabling natural language adaptation of robot skills without fine-tuning the LLM or retraining the control policy. It maintains a strict separation between semantic understanding and deterministic control.
Novel KMP Extensions: New extensions to KMPs for natural language-driven speed modulation and obstacle avoidance via repulsion fields, expanding adaptation beyond traditional via-point constraints.
Experimental Validation: Successful deployment on a 7-DoF torque-controlled DLR SARA robot performing an industrial bearing ring insertion task. The system demonstrated reliable adaptation for speed, trajectory correction, and obstacle avoidance while maintaining safety and interpretability.

4. Experimental Results

The system was evaluated on a 7-DoF robot using the Qwen2.5-VL-72B-Instruct local LLM.

Performance Metrics:
- Command Success Rate (CSR): 100% across all experiments (Speed, Trajectory Correction, Obstacle Avoidance).
- Interpretation Success Rate (ISR): 100% for speed and obstacle avoidance; 80% for trajectory correction (occasional unintended speed modulation).
- Task Completion Rate (TCR): 100% for all tasks.
- Response Time: Average adaptation time of 15.4 seconds, significantly faster than the OVITA baseline (72.1s with cloud LLM).
Comparison with OVITA (Code Generation Approach):
- When OVITA was run with a local LLM, performance degraded drastically (e.g., TCR dropped to 0% for obstacle avoidance) due to code generation errors and hallucinations.
- IROSA maintained consistent performance with the same local LLM, proving that structured tool interfaces are more robust and reliable than code generation for industrial skill adaptation.

5. Significance and Impact

Safety & Trust: By constraining the LLM to validated tools rather than direct control or code generation, IROSA ensures predictable robot behavior, a critical requirement for industrial deployment.
Interpretability: The system provides a clear audit trail: users can see exactly which tool was selected and what parameters were applied, unlike "black box" end-to-end models.
Efficiency: The approach requires no model retraining or simulation loops, enabling immediate deployment in offline industrial environments using local LLMs.
Scalability: The modular design allows for the addition of new tools without architectural changes, though context length limits may apply for smaller LLMs.

In conclusion, IROSA bridges the gap between the flexibility of Large Language Models and the rigorous safety requirements of industrial robotics, offering a practical, explainable, and zero-shot solution for interactive robot skill adaptation.

IROSA: Interactive Robot Skill Adaptation using Natural Language

1. The Problem: The "Black Box" vs. The "Toolbox"

2. The Magic Trick: "Zero-Shot" Adaptation

3. How It Works in Real Life (The Three Scenarios)

4. Why Is This Better Than Other Methods?

The Bottom Line

1. Problem Statement

2. Methodology: The IROSA Framework

Core Architecture

Key Adaptation Tools

3. Key Contributions

4. Experimental Results

5. Significance and Impact

More like this

SDR-GAIN: A High Real-Time Occluded Pedestrian Pose Completion Method for Autonomous Driving

A Temporal-Spectral Fusion Transformer with Subject-Specific Adapter for Enhancing RSVP-BCI Decoding

DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild

Dance of the ADS: Orchestrating Failures through Historically-Informed Scenario Fuzzing

Multi-agent Assessment with QoS Enhancement for HD Map Updates in a Vehicular Network