AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

This paper introduces AMV-L, a value-driven memory lifecycle framework for long-running LLM agents that replaces age-based retention with utility-based tiering to bound retrieval workloads, thereby achieving significantly improved tail-latency control and throughput compared to traditional TTL and LRU policies.

Emmanuel Bamidele

Published 2026-03-06
📖 5 min read🧠 Deep dive

Here is an explanation of the paper AMV-L using simple language and everyday analogies.

The Problem: The "Cluttered Garage" Effect

Imagine you have a personal assistant (an AI agent) who helps you with your life. Over time, this assistant collects a massive amount of information: your favorite coffee order, the code for a project you worked on last year, a recipe you tried once, and a thousand random facts you mentioned in passing.

The Current Way (TTL):
Most AI systems today manage this memory like a garage with a strict "expiration date" rule. If you put a box in the garage, it stays there for exactly 30 days. After 30 days, it gets thrown out, no questions asked.

  • The Flaw: This keeps the garage from overflowing, but it doesn't stop the search from getting slow. When you ask, "What's my coffee order?", the assistant has to dig through every single box in the garage that hasn't expired yet to find the right one. If you have 10,000 boxes, that search takes forever. Sometimes, the search is fast; other times, the assistant gets stuck digging through a mountain of irrelevant boxes, causing a massive delay (a "tail latency" spike).

The Result: The assistant is reliable for simple tasks but gets overwhelmed and slow when you ask complex questions after months of use.


The Solution: AMV-L (The "Smart Librarian")

The paper introduces AMV-L, a new way to manage memory. Instead of just looking at how old a memory is, AMV-L looks at how useful it is.

Think of AMV-L as a super-intelligent librarian who organizes a library not by the date the book was published, but by how often people actually read it and how much they love it.

How It Works: The Three Shelves

The librarian divides the library into three specific zones (Tiers):

  1. The "Hot" Shelf (Front Desk):

    • This is where the most useful, frequently used items live.
    • When you ask a question, the librarian only looks here first.
    • Why it helps: The search area is tiny and fast. You get an answer instantly.
  2. The "Warm" Shelf (Back Room):

    • These are items that are useful but not needed every day. They are kept safe but aren't on the front desk.
    • The librarian only pulls a few of these out if the "Hot" shelf doesn't have the answer.
  3. The "Cold" Shelf (The Basement):

    • These are old, rarely used items. They are stored away so they don't clutter the main search area.
    • If an item stays in the basement too long without being used, it gets thrown away to save space.

The Magic: "Value" vs. "Age"

In the old system (TTL), a memory is only kept if it's "young."
In the new system (AMV-L), a memory is kept if it has Value.

  • Scenario A: You mention your coffee order every day. The "Value" score goes up. The item stays on the Hot Shelf, even if it's been there for a year.
  • Scenario B: You mention a random fact once, and never again. Its "Value" score slowly drops. It moves from the Hot Shelf to the Warm Shelf, then to the Cold Shelf, and eventually gets deleted.
  • The Benefit: The assistant never wastes time searching through the basement (Cold Shelf) or the back room (Warm Shelf) unless absolutely necessary. It focuses its energy only on the "Hot" items.

The Results: Speed and Stability

The researchers tested this new system against the old "Garage" system and a middle-ground system (LRU, which just keeps the most recently used items).

  1. Speed: The new system was 3 times faster at handling requests than the old system.
  2. No More "Freezing": The old system had moments where it would freeze for 2+ seconds because it was searching through too much junk. The new system almost eliminated these freezes (dropping from 13% of requests being slow to 0.007%).
  3. Smarter Answers: Because the system keeps high-value information (like your coffee order) even if it's old, it doesn't forget important things just because they aren't "fresh."

The Big Takeaway

The paper argues that for AI agents to be truly reliable, we can't just treat their memory like a storage closet where things rot after a set time. We need to treat memory like a resource that needs active management.

By separating what is stored (the whole library) from what is searched (only the Hot Shelf), AMV-L ensures that the AI stays fast and responsive, no matter how long you've been using it. It trades a tiny bit of speed on average requests to completely eliminate the "nightmare" slow requests that ruin the user experience.

In short: AMV-L stops the AI from digging through the whole attic to find a single screw; it keeps the screw right on the workbench where it belongs.