From Local Corrections to Generalized Skills: Improving Neuro-Symbolic Policies with MEMO

Imagine you have a very smart robot assistant. This robot is great at understanding what you want it to do. If you say, "Make me toast," it knows the steps: find the toaster, open the door, put the bread in, close the door, and push the lever. It's like a brilliant project manager who can break a big job into a checklist.

The Problem: The "Toolbox" Gap
Here's the catch: The robot is a brilliant manager, but it's a terrible worker. It doesn't actually know how to open that specific toaster door. It has a limited toolbox of pre-programmed moves (like "grab," "move," "push"). If it needs to do something new—like "rotate the handle slightly to the left to open the stuck door"—and that move isn't in its toolbox, it fails.

Usually, when a human says, "No, go higher," the robot just remembers that exact sentence for that exact toaster. Next time you ask it to open a different cabinet, it forgets the lesson because the words were slightly different. It's like memorizing a specific answer to a specific math problem but failing to learn the actual formula.

The Solution: MEMO (The Robot's "Master Chef" Cookbook)
The authors of this paper created a system called MEMO (Memory Enhanced Manipulation). Think of MEMO not just as a notebook, but as a living, evolving cookbook that gets smarter every time the robot makes a mistake or succeeds.

Here is how MEMO works, using a simple analogy:

1. The "Recipe Card" Collection (Collecting Feedback)

Every time the robot tries to do something and fails, a human gives a quick correction.

Robot: "I'm trying to open the toaster."
Human: "No, you need to rotate the handle more!"
MEMO's Job: Instead of just writing down "rotate more for toaster," MEMO's brain (an AI language model) rewrites this into a general rule: "When opening a stuck door, apply extra rotation."

It does this for every success, too. If the robot successfully opens a fridge, MEMO saves the "recipe" (the code) it used to do it.

2. The "Master Chef" Review (Clustering)

This is the magic part. Imagine you have 50 different recipe cards from 50 different people, all trying to explain how to open a door. Some say "push hard," some say "pull gently," some say "twist left." If you keep all 50 cards, your kitchen is a mess, and you might get confused.

MEMO acts like a Master Chef who reviews all 50 cards. It groups them together, throws out the contradictory advice, and writes one perfect, generalized recipe card that covers all doors.

Old way: "Open the red toaster door by twisting 15 degrees."
MEMO way: "To open any door, find the handle, grip it, and rotate until it clicks."

This process turns specific, messy human complaints into clean, universal "skills" (like a new function called open_door()).

3. The "Smart Search" (Retrieval)

Now, when the robot faces a new task—say, "Empty the cabinet"—it doesn't just guess. It opens its Cookbook (Skillbook).

It asks: "Have I ever opened a cabinet before? Have I ever dealt with a stuck handle?"
MEMO searches its library and pulls up the generalized recipe for "opening stuck doors" that it learned from the toaster, the fridge, and the microwave.
The robot then uses this new, generalized recipe to write the code to open the cabinet, even though it has never seen that specific cabinet before.

Why This Matters

In the experiments, the researchers tested this on a real robot arm.

Without MEMO: The robot was like a student who memorized answers but couldn't solve new problems. It failed often when faced with new tasks.
With MEMO: The robot became like a master chef who learned the principles of cooking. It could take a lesson from opening a toaster and apply it to opening a cabinet, a bottle, or a fridge.

The Bottom Line
MEMO teaches robots to stop just "remembering" specific fixes and start "learning" general skills. By collecting human feedback, cleaning it up, and turning it into universal rules, the robot builds a toolbox that grows bigger and smarter every day, allowing it to handle new, complex jobs it has never seen before. It's the difference between a robot that follows a script and a robot that actually learns how to work.

Here is a detailed technical summary of the paper "From Local Corrections to Generalized Skills: Improving Neuro-Symbolic Policies with MEMO".

1. Problem Statement

Neuro-symbolic robotics frameworks combine the high-level reasoning of Large Language Models (LLMs) and Vision-Language Models (VLMs) with low-level control. While these models excel at decomposing complex tasks into semantic subtasks (e.g., "open the toaster"), they face a fundamental bottleneck: the grounding of language into embodied motion.

The Constraint: Current systems rely on a fixed library of "skills" (motion primitives, trajectory snippets, or coded functions). The policy can only execute tasks if the necessary skills exist in this library.
The Failure Mode: If a robot encounters a novel situation requiring a skill it does not possess (e.g., a specific way to open a unique door), the policy fails.
Limitations of Existing Feedback: Previous approaches often use human feedback to correct specific errors for a single task instance (local feedback). They fail to synthesize this feedback into generalized, reusable skills that can be applied across different tasks and contexts.

2. Methodology: MEMO (Memory Enhanced Manipulation)

The authors propose MEMO, a framework that dynamically expands a robot's skill library by aggregating human feedback and successful task executions into a retrieval-augmented skillbook. The system operates through three core phases:

A. Collecting Feedback into a Skillbook

MEMO maintains a vector database ( $S$ ) called the skillbook, which stores:

Human Corrections: Natural language feedback is parsed and paraphrased by an LLM to remove task-specific noise (e.g., converting "grab the handle at (0.5, -0.3)" to "move to the door handle").
Contextual Embeddings: Entries are indexed by action tokens (e.g., "open") and object tokens (e.g., "toaster"), allowing for semantic retrieval.
Implicit Feedback (Success): When a robot successfully completes a subtask, the code used is converted into a parameterized function template (removing hardcoded values) and stored alongside the context.

B. Retrieval-Augmented Generation (RAG) at Runtime

When the robot faces a new task:

Decomposition: The policy breaks the task into subtasks.
Retrieval: Before generating code for a subtask, the policy queries the skillbook using cosine similarity between the current action/object intent and the stored embeddings.
Reasoning: The retrieved entries (text guidance and code templates) are fed into the LLM as context. The policy then generates new code for the current skill, guided by the retrieved generalized templates rather than just copying them.

C. Offline Clustering and Generalization

To prevent the skillbook from becoming bloated with repetitive or contradictory entries, MEMO performs an asynchronous background process:

Clustering: Entries with similar embeddings are grouped together.
Synthesis: An LLM processes each cluster, conditioned on the associated successful code templates. It synthesizes a condensed, generalized guidance entry.
Pruning: Contradictory or erroneous feedback is removed if it conflicts with the successful code templates. This results in a compact set of invariant rules and parameterized functions.

3. Key Contributions

The Skillbook Concept: Introduction of a retrieval-augmented database that stores both human corrections and generalized code templates, bridging the gap between local feedback and global skill expansion.
Clustering for Generalization: A novel method to cluster raw feedback around successful code templates. This transforms specific, repetitive corrections into invariant, parameterized skill templates (e.g., a generalized open_door() function).
Cross-Task and Cross-Environment Transfer: Demonstrating that feedback collected in simulation can be aggregated and clustered to generate skills that successfully transfer to real-world robots and novel tasks without further fine-tuning.

4. Experimental Results

The authors evaluated MEMO on a 7-DoF Franka Emika Panda robot across 25 tasks (simulated and real-world), including long-horizon planning, contact-rich manipulation, and semantic reasoning.

Zero-Shot Generalization (Simulation):
- MEMO achieved a 78% success rate on held-out evaluation tasks.
- This significantly outperformed baselines: DROC-V (40%) and TrajGen (28%).
- Ablation studies showed that removing the clustering step (MEMO-C) caused performance to drop, as the robot retrieved irrelevant or contradictory feedback, leading to ineffective skill generation.
Real-World Transfer:
- Using a skillbook built entirely from simulated human feedback, MEMO achieved an 88% success rate on real-world tasks.
- It required significantly less human intervention (1.52 feedback instances per task) compared to baselines like DROC-V (2.76 instances).
- MEMO outperformed the state-of-the-art VLA model $\pi_0.5$ (12% success rate), highlighting the advantage of grounding code generation in aggregated human feedback.

5. Significance

This work addresses a critical limitation in neuro-symbolic robotics: the static nature of skill libraries. By treating human feedback not just as a one-off correction but as a data source for synthesizing generalized skills, MEMO enables robots to:

Dynamically expand their capabilities without retraining foundation models.
Resolve ambiguity in language by grounding it in parameterized code templates derived from successful executions.
Achieve robust zero-shot performance on novel tasks by leveraging a continuously evolving "memory" of how to perform actions, rather than relying solely on pre-defined primitives.

In essence, MEMO shifts the paradigm from "robot learns from data" to "robot learns from human interaction to build a reusable, evolving toolkit of skills."

From Local Corrections to Generalized Skills: Improving Neuro-Symbolic Policies with MEMO

1. The "Recipe Card" Collection (Collecting Feedback)

2. The "Master Chef" Review (Clustering)

3. The "Smart Search" (Retrieval)

Why This Matters

1. Problem Statement

2. Methodology: MEMO (Memory Enhanced Manipulation)

A. Collecting Feedback into a Skillbook

B. Retrieval-Augmented Generation (RAG) at Runtime

C. Offline Clustering and Generalization

3. Key Contributions

4. Experimental Results

5. Significance

More like this

The Structure of Service Level Agreement of Slice-based 5G Network

Digital currency hardware wallets and the essence of money

Adaptive aggregation of Monte Carlo augmented decomposed filters for efficient group-equivariant convolutional neural network

Positionality in Σ_0^2 and a completeness result

Slightly Non-Linear Higher-Order Tree Transducers