LatentMem: Customizing Latent Memory for Multi-Agent Systems

This paper introduces LatentMem, a learnable multi-agent memory framework that addresses memory homogenization and information overload by using an experience bank and a memory composer to generate customized, token-efficient latent memories, further optimized via Latent Memory Policy Optimization (LMPO) to significantly enhance multi-agent system performance.

Muxin Fu, Xiangyuan Xue, Yafu Li, Zefeng He, Siyuan Huang, Xiaoye Qu, Yu Cheng, Yang Yang

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Imagine a team of AI agents working together to solve a complex puzzle, like building a software app or planning a robot's path through a maze. In the past, these teams had a major problem: they were all trying to remember the exact same things in the exact same way.

Think of it like a group of friends trying to plan a road trip. If everyone is forced to carry the entire map, every receipt, and every photo from the last 100 trips in their pockets, they get overwhelmed. The driver gets lost in the paperwork, the navigator forgets the turn-by-turn directions, and the mechanic can't find the specific tool they need. This is what happens in current AI systems: too much information, and everyone remembers it identically, regardless of their specific job.

LatentMem is a new framework that fixes this by giving the team a "smart, personalized memory system." Here is how it works, broken down into simple concepts:

1. The "Experience Bank" (The Raw Library)

Imagine a massive, lightweight library where the team stores the raw transcripts of every road trip they've ever taken. It doesn't try to summarize or edit these trips yet; it just stores the raw data efficiently.

  • The Problem it solves: Instead of rewriting the whole story every time, the system just keeps the original logs. It's like having a digital recorder for every conversation.

2. The "Memory Composer" (The Personalized Summarizer)

This is the magic part. When a specific agent (say, the "Driver") needs to make a decision, it doesn't read the whole library. Instead, it asks the Memory Composer for help.

  • How it works: The Composer looks at the raw library, sees what the Driver needs, and says, "Ah, you're the Driver. Here is a 3-sentence summary of the last 5 times you took a left turn on a rainy road."
  • The Magic: If the "Mechanic" asks the same question, the Composer gives a totally different summary: "Here is the last 3 times the engine overheated."
  • The Result: The memory is customized. The Driver isn't distracted by engine specs, and the Mechanic isn't confused by turn signals. This stops the team from "homogenizing" (all thinking the same way).

3. "Latent Tokens" (The Secret Handshake)

Usually, AI agents talk to each other using long, chatty sentences. This takes up a lot of space and time (like sending a 10-page email when a text message would do).
LatentMem compresses these memories into Latent Tokens.

  • The Analogy: Imagine instead of writing a paragraph about "how to fix a flat tire," the agent just sends a secret, compressed code (a "latent token") that instantly triggers the knowledge of "fix flat tire" in the other agent's brain.
  • The Benefit: It's incredibly fast and uses very little "bandwidth" (tokens), allowing the team to work much faster without getting bogged down by long texts.

4. "LMPO" (The Coach that Learns)

How does the Memory Composer know what to summarize? It uses a training method called **Latent Memory Policy Optimization (LMPO).

  • The Analogy: Think of LMPO as a coach watching the team play. If the team wins, the coach tells the Composer, "Great job! The summary you gave the Driver was perfect." If they lose, the coach says, "You gave the Mechanic too much info about the weather; next time, focus on the engine."
  • The Result: The system learns to create better, more useful summaries over time without needing a human to rewrite the rules.

Why is this a big deal?

The paper shows that with LatentMem:

  1. They are smarter: The team solves puzzles (like coding or logic games) up to 19% better than before.
  2. They are faster: They use 50% fewer words (tokens) to get the job done, saving time and money.
  3. They are adaptable: They can walk into a brand new type of game or a new team structure and still perform well, because they know how to customize their own memories on the fly.

In short: LatentMem stops AI teams from drowning in a sea of identical data. Instead, it gives every team member a personalized, compressed "cheat sheet" that is perfectly tailored to their specific role, making the whole team smarter, faster, and more efficient.