Adaptive Memory Admission Control for LLM Agents

Imagine you have a brilliant but slightly overwhelmed personal assistant named "Agent." This Agent is great at chatting, solving problems, and using tools. However, it has a major flaw: it has a terrible memory.

Sometimes, it forgets important things you told it yesterday. Other times, it remembers things you never said (hallucinations) or holds onto every single "Hello" and "Thanks" you ever typed, cluttering its brain until it can't find the important stuff.

Current solutions are like two extremes:

The Hoarder: It saves everything. This makes the brain huge, slow, and full of junk.
The Forgetful Genius: It saves nothing unless a very expensive, slow "brain scan" (a complex AI model) tells it to. This is accurate but takes forever and costs a lot of money.

The paper you shared introduces A-MAC (Adaptive Memory Admission Control). Think of A-MAC as a smart, efficient bouncer standing at the door of the Agent's long-term memory. Instead of letting everything in or asking a super-expensive expert to check every single guest, A-MAC uses a quick, five-point checklist to decide who gets in.

Here is how the "Bouncer" works, using five simple rules:

1. The Five-Point Checklist (The "Value Signals")

When a piece of information (a "guest") tries to enter the memory, A-MAC asks five questions:

🔮 Future Utility (Will this be useful later?):
- Analogy: "If I save this, will I need it for a future task?"
- How it works: The system uses a quick AI check to see if this fact helps solve future problems or if it's just small talk.
🛡️ Factual Confidence (Did they actually say this?):
- Analogy: "Is this a proven fact, or is the guest making things up?"
- How it works: It checks the chat history. If the Agent said it, it's a "maybe." If the User said it, it's a "yes." This stops the Agent from remembering its own lies (hallucinations).
✨ Semantic Novelty (Have we heard this before?):
- Analogy: "Is this a new story, or are they just repeating the same joke?"
- How it works: It checks if the memory is already in the database. If it's a duplicate, it gets kicked out to save space.
⏳ Temporal Recency (How fresh is this?):
- Analogy: "Is this news from today or a rumor from last year?"
- How it works: Old information slowly fades in value (like milk expiring). Recent chats get a higher score.
🏷️ Content Type Prior (What kind of info is this?):
- Analogy: "Is this a permanent rule (like 'I hate cilantro') or a temporary mood (like 'I'm tired right now')?"
- How it works: This is the most important rule. The system knows that "User Preferences" are gold and should always be saved, while "Current Weather" or "Mood" is usually junk that should be forgotten.

2. The Decision Process

The Bouncer doesn't just guess. It gives the guest a score based on these five rules.

High Score? The guest gets a VIP pass into long-term memory.
Low Score? The guest is politely turned away.

Crucially, the Bouncer is hybrid. It does the heavy lifting (checking facts, novelty, and type) using fast, cheap, simple rules (like a calculator). It only calls in the "expensive expert" (the big AI model) once to check if the information will be useful in the future. This makes the whole process 31% faster than other methods while being smarter.

3. Why This Matters (The Results)

The researchers tested this on a benchmark called LoCoMo (Long-Context Memory). Here is what happened:

Better Accuracy: A-MAC was much better at keeping the right memories and forgetting the wrong ones. It improved the "F1 score" (a measure of overall success) to 0.583, beating the previous best methods.
Speed: Because it uses simple rules for most checks, it was 31% faster than the competition.
No More Hallucinations: By strictly checking if the information was actually said by the user, it stopped the Agent from remembering things that never happened.

The Big Takeaway

Before A-MAC, building a memory for AI agents was like trying to fill a library by throwing books in a pile and hoping the librarian sorts them later. It was messy, slow, and full of errors.

A-MAC is like hiring a professional librarian with a strict, transparent checklist. It knows exactly what to keep, what to throw away, and why. It ensures the AI's brain stays clean, fast, and reliable, so it can actually remember what you told it last week without getting confused by its own daydreams.

In short: A-MAC teaches AI agents how to be good note-takers, not just good talkers.

Here is a detailed technical summary of the paper "Adaptive Memory Admission Control for LLM Agents" (A-MAC), published at the ICLR 2026 Workshop MemAgent.

1. Problem Statement

Large Language Model (LLM) agents increasingly rely on long-term memory to support multi-session reasoning and interaction. However, current systems lack effective control over memory admission (deciding what information to retain). Existing approaches suffer from two main limitations:

Heuristic-based methods (e.g., MemGPT, MemoryBank): Rely on hand-crafted scoring functions (recency, relevance) that lack principled mechanisms to prevent hallucinated or obsolete facts from entering memory.
LLM-native methods (e.g., A-mem, Mem0): Delegate admission entirely to LLMs. While effective at recall, they incur high computational costs, offer limited interpretability, and make auditing difficult.

The core challenge is to create a memory admission system that is efficient, interpretable, hallucination-aware, and adaptive to different domains without requiring manual tuning.

2. Methodology: A-MAC Framework

The authors propose Adaptive Memory Admission Control (A-MAC), which treats memory admission as a structured decision problem rather than an implicit byproduct of generation. The framework evaluates candidate memories before they enter long-term storage using a hybrid approach.

2.1 Five Interpretable Value Signals

A-MAC decomposes memory value into five complementary dimensions, aggregated into a composite score $S(m)$ :

$S(m) = w_1 \cdot U(m) + w_2 \cdot C(m) + w_3 \cdot N(m) + w_4 \cdot R(m) + w_5 \cdot T(m)$

Utility ( $U$ ): Estimates the likelihood of future relevance. Computed via a single LLM call (temperature 0) to assess if the information is actionable or supports follow-up questions.
Confidence ( $C$ ): Measures factual support to mitigate hallucinations. Calculated using ROUGE-L alignment between the candidate and supporting spans in the conversation history.
Novelty ( $N$ ): Prevents redundant storage. Computed as $1 - \max(\text{cosine similarity})$ against existing memories using Sentence-BERT embeddings.
Recency ( $R$ ): Accounts for temporal decay using an exponential decay function ( $\lambda = 0.01$ per hour).
Type Prior ( $T$ ): Encodes domain knowledge about information persistence. Uses rule-based pattern matching (POS tags) to assign higher weights to stable facts (e.g., preferences, identity) and lower weights to transient states.

2.2 Hybrid Architecture & Policy Learning

Efficiency: The system uses lightweight rule-based extraction for four features ( $C, N, R, T$ ) and reserves the expensive LLM inference only for the semantic Utility feature ( $U$ ).
Decision Rule: A candidate is admitted if $S(m) \geq \theta$ (a learned threshold). If it conflicts with an existing memory (high similarity but different content), the system merges them, keeping the higher-scoring version.
Optimization: The weight vector $\mathbf{w}$ and threshold $\theta$ are learned via 5-fold cross-validation on labeled training data, maximizing the F1 score. This allows the policy to adapt to different conversational domains without manual intervention.

3. Key Contributions

Problem Formulation: Identifies memory admission as a critical, under-specified control problem and analyzes the trade-offs between heuristic and fully LLM-driven approaches.
A-MAC Framework: Introduces an interpretable framework using five distinct signals to jointly capture value, reliability, and persistence.
Efficient Hybrid Design: Combines rule-based feature computation with minimal LLM inference, achieving a balance between accuracy and computational efficiency.
Empirical Validation: Demonstrates superior performance on the LoCoMo benchmark with ablation studies clarifying the specific role of each admission factor.

4. Experimental Results

Experiments were conducted on the LoCoMo benchmark (225 test samples across personal, technical, and research domains).

4.1 Performance Comparison

A-MAC outperformed state-of-the-art baselines (MemGPT, MemoryBank, A-mem):

F1 Score: 0.583 (vs. 0.541 for A-mem, the previous SOTA).
Precision: 0.417 (Highest among LLM-based methods), significantly reducing memory bloat.
Recall: 0.972 (Near-perfect, comparable to A-mem's 1.0).
Latency: 2644 ms per candidate, a 31% reduction compared to A-mem (3831 ms). This is because A-MAC uses only one LLM call, whereas A-mem requires multiple sequential calls for attribute generation.

4.2 Ablation Study

Removing individual features revealed their relative importance:

Type Prior ( $T$ ): The most influential factor. Removing it caused the largest drop in F1 ( $\Delta F1 = -0.107$ ), reducing performance to the level of an "Equal Weights" baseline. This confirms that distinguishing persistent content types (preferences/identity) from transient states is crucial.
Other Factors: Novelty, Utility, Confidence, and Recency provided incremental improvements ( $\Delta F1$ between -0.013 and -0.028).

4.3 Generalization

Cross-Domain: The learned weights generalized effectively across "Personal" and "Professional" domains without retuning, though performance was slightly higher in Personal conversations (where explicit preference statements align well with the Type Prior feature).
Threshold Sensitivity: The system showed robustness in the threshold range $[0.50, 0.60]$ , where F1 remained high.

5. Significance and Conclusion

A-MAC establishes that explicit and interpretable admission control is a critical design principle for scalable LLM agents.

Reliability: By explicitly modeling confidence and content type, A-MAC prevents the propagation of hallucinations and obsolete facts.
Efficiency: The hybrid design proves that not every memory decision requires a full LLM inference, offering a 31% speedup over purely neural approaches.
Transparency: Unlike black-box neural memory systems, A-MAC's linear scoring model allows developers to inspect feature weights and scores, facilitating debugging and policy auditing.

The paper concludes that treating memory admission as a structured, data-driven decision problem yields a system that is more reliable, efficient, and adaptable than current state-of-the-art solutions.