SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation

The paper proposes SeedPolicy, a self-evolving diffusion policy enhanced by a novel Self-Evolving Gated Attention (SEGA) module that efficiently compresses long-horizon observations, enabling state-of-the-art performance in robotic manipulation tasks with significantly fewer parameters than existing vision-language-action models.

Youqiang Gui, Yuxuan Zhou, Shen Cheng, Xinyang Yuan, Haoqiang Fan, Peng Cheng, Shuaicheng Liu

Published 2026-03-06
📖 4 min read☕ Coffee break read

Imagine you are teaching a robot to perform a complex task, like making a sandwich or organizing a messy desk. You show it a video of a human doing it perfectly. This is called Imitation Learning.

For a long time, the best way to teach robots was to show them a short "clip" of what happened just a second ago. But here's the problem: if the task is long and complicated (a "long horizon" task), the robot gets confused. It forgets what it did five minutes ago, or it gets stuck because it thinks it's back at the beginning.

This paper introduces a new robot brain called SeedPolicy. Think of it as giving the robot a "super-memory" and a "smart filter" so it can handle long, complicated jobs without getting lost.

Here is how it works, broken down with simple analogies:

1. The Problem: The "Goldfish" Robot

Imagine a robot that only remembers the last 3 seconds of what it sees.

  • The Issue: If you ask it to "put the red block in the box, then the blue block," it might put the red block in, forget it did that, and then try to put the red block in again. Or, if the camera shakes or the background changes, the robot panics because it thinks the world has changed completely.
  • The Result: The longer the task, the worse the robot gets. It's like trying to solve a 1,000-piece puzzle while only being allowed to look at the last three pieces you picked up.

2. The Solution: SeedPolicy's "Smart Notebook"

The authors created a system called SEGA (Self-Evolving Gated Attention). Let's break that down into two parts:

A. The "Living Notebook" (Self-Evolving Latent State)

Instead of just looking at the last few frames, SeedPolicy keeps a living summary of everything that has happened so far.

  • Analogy: Imagine a detective solving a crime. A bad detective only looks at the crime scene right now. A good detective keeps a notebook where they write down every clue, every suspect they met, and every theory they had. Even if the crime scene changes, the detective can look at their notebook and remember, "Oh right, I already checked the red door."
  • How it helps: SeedPolicy compresses hours of video into a tiny, efficient "notebook" (a latent state). This allows the robot to remember the whole story of the task, not just the last few seconds.

B. The "Smart Filter" (Gated Attention)

Now, imagine that notebook is getting messy. It has scribbles about the weather, the color of the walls, and a bird flying by. These are distractions.

  • The Issue: If the robot tries to remember everything, it gets overwhelmed by noise (like a background moving or a shadow).
  • The Fix: SeedPolicy has a Smart Filter (the "Gate").
  • Analogy: Think of a bouncer at an exclusive club. The bouncer looks at every piece of information coming in. "Is this important? Did the robot move the cup? Yes, let it in. Is this just a shadow on the wall? No, keep it out."
  • How it works: The robot uses its own attention mechanism to decide what is "important" and what is "noise." It actively deletes the distractions from its memory, keeping the notebook clean and focused only on the task.

3. Why This is a Big Deal

  • Scaling Up: Previous robots got worse as tasks got longer. SeedPolicy gets better as tasks get longer because its "notebook" gets more useful.
  • Efficiency: There are other massive AI models (like RDT) that are huge and expensive, like a supercomputer in a backpack. SeedPolicy is like a smartphone: it's much smaller, cheaper, and faster, but it does the job just as well (or better) for these specific robot tasks.
  • Real-World Success: They tested this on a real robot arm. When the robot had to do a loop (pick up a block, put it down, pick it up again), old robots got stuck in a loop of confusion. SeedPolicy remembered, "I already did that part," and kept moving forward.

The Bottom Line

SeedPolicy is like upgrading a robot from having a short-term memory and a cluttered mind to having a photographic memory with a personal assistant who filters out the noise. It allows robots to finally tackle long, complex, multi-step jobs without getting confused or stuck, all while running on hardware that isn't too expensive or heavy.