When to Forget: A Memory Governance Primitive

Imagine you have a giant, magical backpack that you use every day to solve problems. Inside this backpack, you keep thousands of sticky notes, each containing a piece of advice, a fact, or a strategy you've learned from the past.

For a long time, when you added a new note to the backpack, you just assumed it was good. You might have written "Remember to wear a coat" on it because it was raining that day. But what if the weather changes? What if you are now living in a desert? That note is still in your backpack, but it's no longer helpful. In fact, if you keep pulling it out when you're hot, it might even make you fail.

The problem with current AI "agents" (smart computer programs) is that their backpacks are full of these outdated or useless notes, and they have no good way to decide which ones to throw away. They usually just guess which notes are important when they write them, and then never check if those notes actually helped them succeed later.

This paper introduces a simple but powerful new tool called Memory Worth (MW). Think of it as a scorecard system for every single note in your backpack.

How Memory Worth Works: The "Two-Counter" System

Instead of just guessing if a note is good, MW gives every note two simple counters:

The "High-Five" Counter: How many times did this note show up right before you succeeded?
The "Face-Palm" Counter: How many times did this note show up right before you failed?

At the end of the day, you calculate the Memory Worth by dividing the High-Fives by the total of both counters.

If a note has 10 High-Fives and 0 Face-Palms, its score is 1.0 (Perfect!).
If a note has 0 High-Fives and 10 Face-Palms, its score is 0.0 (Terrible!).
If a note has 5 of each, the score is 0.5 (Maybe it's okay, maybe not).

The Magic of "Association" vs. "Causation"

Here is the most important part: MW doesn't need to know why you succeeded. It doesn't need to be a detective.

Imagine you are a chef. You have a note that says "Add Salt" and another that says "Add Pepper."

Every time you cook a great steak, you add both salt and pepper.
Every time you burn a steak, you add both salt and pepper.

If you only look at the "Add Pepper" note, you might think, "Hey, I added pepper, and the steak was great! Pepper must be the secret!" But actually, the salt was the real hero, or maybe the steak was just good because of the fire.

MW is okay with this confusion. It simply says: "Hey, the Pepper note shows up a lot with good results. Let's give it a high score for now." Even if Pepper isn't the cause of the success, it is a reliable signal that you are in a "good cooking situation."

The paper proves mathematically that if you keep track of these scores over thousands of tries, the notes that are truly helpful will naturally rise to the top, and the useless ones will sink to the bottom.

The Three Ways the System Can Get Confused

The authors are very honest about where this system might get tricked. They tested three specific scenarios:

The "Hard Task" Trap: Imagine you have a note about "Using a ladder."
- You use it when fixing a roof (Hard task, low success rate).
- You never use it when fixing a toy (Easy task, high success rate).
- The system might think the ladder note is "bad" because it only shows up when you fail. But actually, the ladder is great; the task was just hard.
- Solution: You need to tell the system, "Only judge the ladder note when we are doing roof jobs."
The "Hitchhiker" Problem: Imagine two notes: "Check the Oil" (Good) and "Check the Air" (Useless).
- Every time you check the oil, you also check the air.
- When the car runs well, both notes get a High-Five.
- The system thinks "Check the Air" is a genius note because it's always riding along with the good one.
- Solution: You need to occasionally check the oil without checking the air, so the system learns that the air check isn't actually doing the work.
The "Feedback Loop": If the system starts trusting a note too much, it might keep showing it to you. If that note is actually bad, the system keeps failing, but it keeps showing the note because it thinks it's important.
- Good News: The paper shows that if the system is designed right, it actually fixes itself. If a "trusted" note keeps causing failures, its score drops, and the system stops showing it.

Why This Matters

In the real world, AI agents are constantly learning. They need to know when to forget.

Old Facts: If an AI learned that "Czechoslovakia is a country" in 1990, that note should eventually get a low score because it's no longer true.
Bad Habits: If an AI keeps trying a coding trick that used to work but now crashes the program, the "Face-Palm" counter should go up, and the AI should stop using it.

The Bottom Line

This paper proposes a simple, lightweight way to make AI smarter about its own memory. Instead of guessing what's important, it lets the AI learn from its own wins and losses.

It's like giving every note in your backpack a tiny scoreboard. Over time, the scoreboard tells you exactly which notes to keep, which to ignore, and which to throw in the trash. It doesn't require the AI to be a genius; it just requires the AI to be honest about whether things worked out or not.

By using this "Memory Worth" score, we can build AI agents that don't just hoard information, but actually curate a library of wisdom that gets better every single day.

1. Problem Statement

Current AI agent memory systems accumulate vast amounts of experience but lack a principled, operational metric to govern memory quality over time.

The Gap: Existing systems rely on static "write-time" importance scores (assigned by LLMs or heuristics at the moment of storage) or structural heuristics. They fail to utilize outcome feedback from subsequent episodes to update the trustworthiness of stored memories.
The Consequence: Agents cannot distinguish between memories that are genuinely useful, those that are stale (outdated), and those that are merely correlated with success due to confounding factors (e.g., retrieved alongside useful memories or only during easy tasks). This leads to the accumulation of low-quality or harmful memories and the inability to perform "selective forgetting."
The Goal: To define a lightweight, online primitive that tracks the relationship between memory retrieval and task outcomes, enabling dynamic decisions on suppression, prioritization, and deprecation without requiring complex causal attribution.

2. Methodology: Memory Worth (MW)

The paper proposes Memory Worth (MW), a two-counter statistic maintained for each memory unit $m$ .

Definition

For a memory $m$ after $T$ episodes, MW is defined as the weighted empirical success rate:
$MW_T(m) = \frac{hits^+_T(m)}{hits^+_T(m) + hits^-_T(m)}$
Where:

$hits^+_T(m)$ : Sum of retrieval weights $w_t(m)$ for episodes where $m$ was retrieved and the outcome was success ( $y_t = +1$ ).
$hits^-_T(m)$ : Sum of retrieval weights $w_t(m)$ for episodes where $m$ was retrieved and the outcome was failure ( $y_t = -1$ ).
$w_t(m)$ : A retrieval weight (e.g., uniform, score-proportional, or oracle) reflecting the memory's influence on the action.
Initialization: If no retrievals have occurred, $MW = 0.5$ (uninformative prior).

Theoretical Foundation

The authors prove that under specific assumptions, MW converges almost surely to the conditional success probability:
$p^+(m) = \Pr[y_t = +1 \mid m \in M_t]$
Key Assumptions for Convergence:

Stationarity: The joint distribution of retrieval and outcomes is stationary.
Exploration: Every memory is retrieved infinitely often (minimum probability $\delta$ ).
Conditional Independence: Given history, the retrieval decision is independent of the outcome (crucial for avoiding bias from task difficulty).
Outcome Stationarity: The success probability given retrieval is constant across episodes.

The proof utilizes a martingale argument, showing that the estimation error vanishes as the total weighted retrieval count increases.

Value Taxonomy

The dual-counter nature of MW allows for a richer classification than a simple ratio:

High-value: High success ratio + High evidence volume.
Uncertain: Low evidence volume (regardless of ratio).
Mixed-outcome: High evidence volume but ambiguous ratio (requires context slicing).
Low-value: Low success ratio + High evidence volume (candidate for deprecation).

3. Key Contributions

Governance Primitive: Introduction of MW, a lightweight, two-counter online signal that enables staleness detection and deprecation without architectural changes or causal attribution.
Theoretical Grounding: Proof of almost-sure convergence to $p^+(m)$ under explicit assumptions, establishing MW as a mathematically sound estimator of post-retrieval outcome association.
Failure Mode Analysis: Systematic characterization of three failure modes where assumptions are violated:
- Task-Difficulty Confounding: Global MW can be negatively correlated with true utility if specialists are only retrieved for hard tasks.
- Retrieval Policy Feedback: MW-based retrieval policies do not necessarily collapse; they can be self-correcting in stationary regimes.
- Co-retrieval Confounding: Memories retrieved together (e.g., "hitchhikers") cannot be distinguished without sufficient retrieval diversity (breaking the co-retrieval habit).
Empirical Validation: Validation in synthetic and text-based environments using standard embedding models (all-MiniLM-L6-v2).

4. Experimental Results

The paper presents five experiments:

Experiment 1 (Synthetic Convergence): In a controlled environment with ground-truth utility, MW converges to a Spearman rank-correlation of $\rho = 0.89 \pm 0.02$ with true utility after 10,000 episodes. Systems that never update (static baselines) remain at $\rho = 0.00$ .
Experiment 2 (Task-Difficulty Confound): When specialists are only retrieved for hard tasks, global MW becomes negatively correlated ( $\rho \approx -0.33$ ). Conditioning MW on task type recovers a positive signal ( $\rho \approx +0.14$ ), highlighting the need for context-aware estimation.
Experiment 3 (Feedback Loop): Even when retrieval is biased by current MW scores (softmax policy), the system remains stable and converges to $\rho \approx 0.89$ , showing the estimator is robust to self-reinforcing loops in stationary settings.
Experiment 4 (Co-retrieval Confound): A "hitchhiker" memory (useless but always retrieved with a useful "anchor") accumulates false positive counts. Meaningful separation between the two only occurs when ~30% of episodes break the co-retrieval pattern, proving that retrieval diversity is a necessary condition for accurate governance.
Experiment 5 (Text-Based Retrieval): Using real text and all-MiniLM-L6-v2 embeddings over 3,000 episodes:
- Stale Memory: MW dropped from ~0.97 to 0.17 after a knowledge shift, correctly triggering deprecation.
- Specialist vs. Hitchhiker: Both stabilized at high values (~0.77) due to semantic similarity, reproducing the co-retrieval confound predicted in Exp 4.

5. Significance and Implications

Operational Shift: MW shifts memory governance from static, write-time heuristics to dynamic, outcome-driven learning. It allows agents to "forget" based on evidence of failure rather than just time decay.
Minimal Overhead: The system requires only two scalar counters per memory unit and standard logging of retrievals and outcomes, making it easily integrable into existing agent architectures.
Association vs. Causation: The paper explicitly acknowledges that MW measures association, not causation. A memory may have high MW because it co-occurs with useful memories, not because it causes success. However, the authors argue this is a sufficient operational signal for governance: prioritizing high-MW memories improves the expected outcome of retrieval sets.
System Requirements: The failure mode analysis dictates that future memory governance systems must implement:
1. Contextual Estimation: Conditioning MW on task types or query clusters to avoid confounding.
2. Retrieval Diversity: Enforcing exploration to break co-retrieval habits and disentangle "hitchhiker" memories.
3. Uncertainty Awareness: Using evidence volume (counter totals) to avoid acting on low-confidence estimates.

In conclusion, Memory Worth provides the foundational mathematical and operational primitive necessary for the next generation of self-correcting, long-term memory systems in AI agents.