ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL Problems

The paper proposes ELMUR, a transformer architecture equipped with structured external memory that enables robotic agents to effectively handle long-horizon, partially observable tasks by significantly extending effective horizons and outperforming existing baselines on synthetic and real-world manipulation benchmarks.

Egor Cherepanov, Alexey K. Kovalev, Aleksandr I. Panov

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine you are teaching a robot to cook pasta. It stirs the pot, adds a pinch of salt, and then... forgets it did so. Five minutes later, it adds salt again. Then again. Soon, the dish is inedible.

Why does this happen? Because the robot can't "remember" the invisible salt it just added. In the real world, robots often face this "partial observability" problem: they can't see everything happening, and they can't hold onto important clues for long.

This paper introduces ELMUR, a new way to give robots (and AI agents) a superpower: a structured, long-term memory that actually works.

Here is the simple breakdown of how it works, using some everyday analogies.

1. The Problem: The "Short Attention Span" Robot

Most modern AI robots are like students who only study the last 5 minutes of a lecture. If the teacher mentions a crucial rule 10 minutes ago, the robot has already forgotten it.

  • Standard AI: Has a "context window." It can only look back at the last few seconds of video or data. If the task takes hours (like a long maze or a complex cooking recipe), the robot hits a wall and forgets the beginning.
  • The Result: They fail at long tasks because they can't connect the "start" of the task with the "end."

2. The Solution: ELMUR (The "Layered Notebook" System)

The authors propose a new architecture called ELMUR. Think of a standard Transformer (the brain behind most AI) as a single, giant whiteboard. ELMUR changes this by giving every single layer of the brain its own personal notebook.

Here is how ELMUR works, step-by-step:

A. The Two Tracks: Reading and Writing

Imagine a factory assembly line.

  • The Token Track (The Workers): These are the robots processing what they see right now. They are busy looking at the current video frame.
  • The Memory Track (The Notebooks): Running parallel to the workers are these special notebooks. They don't change every second; they hold onto important facts.
  • The Interaction:
    • Reading (mem2tok): The workers glance at the notebooks to ask, "Did we add salt yet?"
    • Writing (tok2mem): If the workers see something important (like "Salt added!"), they write it into the notebook.

B. The "Least Recently Used" (LRU) Librarian

You can't write in a notebook forever; eventually, the pages run out. ELMUR uses a smart librarian called LRU to manage the pages.

  • The Analogy: Imagine a hotel with a limited number of rooms (memory slots).
    • If a guest (a new piece of information) arrives and there is an empty room, they move in immediately.
    • If the hotel is full, the librarian looks at who checked in the longest time ago and hasn't been visited since. That guest is asked to leave (or their room is blended with the new guest's info).
  • Why this is cool: This ensures the robot keeps the most relevant recent history while discarding old, useless junk. It prevents the robot from getting overwhelmed by too much data.

C. The "Convex Blending" (The Smooth Transition)

Sometimes, instead of kicking the old guest out immediately, the librarian mixes the new guest's info with the old one. This is called convex blending. It's like slowly fading out an old photo while fading in a new one, ensuring the memory doesn't suddenly vanish or become chaotic.

3. The Results: Superhuman Memory

The researchers tested ELMUR on three types of challenges:

  1. The T-Maze (The Long Hallway): Imagine a robot walking down a hallway that is one million steps long. At the start, it sees a sign saying "Turn Left." It walks for a million steps, then has to turn left.

    • Old Robots: Forgot the sign after step 100.
    • ELMUR: Remembered the sign perfectly and turned left at step 1,000,000. 100% success rate.
  2. POPGym (The Puzzle Box): A collection of 48 different logic puzzles and control games where you have to remember clues from the past to solve the present.

    • ELMUR: Won or tied for first place on 24 out of 48 tasks, beating all other top AI models.
  3. MIKASA-Robo (The Robot Chef): Real-world robotic tasks where the robot has to manipulate objects based on visual cues (like "pick up the red block, then the blue one").

    • ELMUR: Nearly doubled the success rate of the best previous robots. It successfully completed 21 out of 23 tasks, whereas other robots struggled with just a few.

The Big Picture

Think of ELMUR as giving an AI a structured diary instead of just a short-term memory.

  • It doesn't try to remember everything (which is impossible).
  • It doesn't forget important things just because time has passed.
  • It organizes its memory so it can look back thousands of steps to find the clue it needs right now.

In short, ELMUR allows robots to stop being "amnesiacs" and start being strategic planners capable of handling complex, long-term jobs in the real world.