AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization

The paper presents AutoQD, a theoretically grounded method that automatically discovers diverse, high-performing policies in continuous control tasks by generating behavioral descriptors through random Fourier feature embeddings of policy occupancy measures, thereby eliminating the need for hand-crafted descriptors in Quality-Diversity optimization.

Saeed Hedayatian, Stefanos Nikolaidis

Published 2026-03-05
📖 4 min read☕ Coffee break read

Imagine you are a coach trying to train a team of robots to walk, run, or swim. Your goal isn't just to find the one robot that walks perfectly; you want to find a whole team of robots that can do many different things: some walk fast, some hop, some crawl, and some even slide. This is the challenge of Quality-Diversity (QD) optimization.

The problem with previous methods is that the coach had to manually tell the robots what to look for. "Okay, robot A, try walking with your legs wide. Robot B, try hopping on one foot." This is like trying to describe a painting by only listing the colors you used; you miss the whole picture. If the coach doesn't know about "sliding," no robot will ever learn to slide.

AutoQD is a new method that acts like a super-smart, self-learning coach that doesn't need instructions on what to look for. Here is how it works, using some simple analogies:

1. The "Footprint" Analogy (Occupancy Measures)

Every time a robot moves, it leaves a trail of "footprints" (state-action pairs) in the world.

  • Old way: The coach looks at the footprints and tries to guess, "Is this a hop? Is this a run?" based on a checklist they wrote down.
  • AutoQD way: AutoQD looks at the entire pattern of footprints. It doesn't care about the checklist. It just looks at the "shape" of the robot's journey. If two robots leave very different patterns of footprints, AutoQD knows they are behaving differently, even if it can't name the difference yet.

2. The "Magic Translator" (Random Fourier Features)

The patterns of footprints are incredibly complex and messy, like a giant, tangled ball of yarn. You can't easily compare two balls of yarn to see how different they are.

AutoQD uses a mathematical trick called Random Fourier Features to act as a Magic Translator.

  • Imagine taking that tangled ball of yarn and instantly turning it into a smooth, colorful 3D sculpture.
  • If two robots behave similarly, their sculptures look almost identical.
  • If they behave differently, their sculptures look very different.
  • This translation happens automatically. The system doesn't need to know what the behavior is; it just knows that the shapes are distinct.

3. The "Compass" (Behavioral Descriptors)

Now that the coach has these beautiful 3D sculptures, they are still too complex to use for organizing a team. You can't put a sculpture in a filing cabinet.

AutoQD takes these complex sculptures and squashes them down into a simple 2D map (like a compass with just "North" and "East").

  • It does this by looking at the "best" robots in the team and asking, "What are the most important directions that make these robots unique?"
  • It creates a Compass that points toward the most interesting differences.
  • Now, instead of a messy sculpture, the coach has a simple coordinate: "Robot A is at [North, East]" and "Robot B is at [South, West]."

4. The "Archive" (The Collection)

The coach uses this new Compass to fill up a Digital Archive.

  • The archive is like a grid of boxes.
  • The coach puts the best robot they find into the box that matches its compass coordinates.
  • If a new robot is slightly different (a new coordinate) and performs well, it gets its own box.
  • Over time, the archive fills up with a huge variety of robots, covering every corner of the "behavior map."

Why is this a big deal?

  • No Manual Cheating: Before, if you wanted a robot to "slide," you had to tell the computer to look for sliding. With AutoQD, the computer just says, "Hey, this robot is doing something totally different from the others, let's keep it!" and it discovers sliding on its own.
  • Robustness: Because the archive is full of different ways to solve a problem, if the environment changes (e.g., the floor becomes slippery), the coach doesn't have to start from scratch. They just look at the archive and say, "Oh, Robot C was already good at sliding on wet floors. Let's use that one!"
  • Open-Ended Discovery: It allows robots to discover behaviors humans might never have thought to ask for, like a robot learning to "dance" or "roll" just because those behaviors filled empty spots in the archive.

In Summary

AutoQD is like giving a robot coach a magic camera that automatically takes a photo of a robot's behavior, turns it into a simple map coordinate, and organizes the best robots into a library. It doesn't need a human to say "look for hopping"; it just looks for difference and quality, automatically discovering a universe of new behaviors that humans might have missed.