MemX: A Local-First Long-Term Memory System for AI Assistants

Imagine you have a brilliant new assistant named Alex. Alex is incredibly smart, can write poetry, debug code, and hold deep conversations. But there's a catch: Alex has the memory of a goldfish.

Every time you close the chat window, Alex forgets everything. You have to re-explain your coffee preferences, your project deadlines, and that weird rule about how you like your emails formatted. You end up asking the same questions over and over, and sometimes, when you ask something Alex should know, he just makes up a confident-sounding lie because he doesn't want to admit he forgot.

MemX is the solution to this problem. It's a "local-first" long-term memory system designed to give AI assistants a permanent, private, and reliable brain.

Here is how MemX works, explained through simple analogies:

1. The "Local-First" Philosophy: Your Own Filing Cabinet

Most AI assistants today store your memories in a giant, shared cloud warehouse owned by a big tech company. If the internet goes down, or the company changes its rules, you lose access.

MemX is different. It's like building a personal filing cabinet right on your own desk.

Privacy: Only you hold the key. No one else can peek inside.
Offline: Even if the power goes out or the internet cuts, your memories are safe in the cabinet.
Speed: Since the files are right next to you, you don't have to wait for a delivery truck (the internet) to bring them back.

2. The Search Engine: The "Two-Legged" Detective

When you ask Alex a question, MemX doesn't just guess. It acts like a detective with two different ways of finding clues:

Leg 1: The "Vibe" Check (Vector Search): This looks for meaning. If you ask about "fixing the broken engine," it finds memories about "repairing the car," even if the exact words "engine" and "fix" aren't used. It understands the feeling of the question.
Leg 2: The "Keyword" Check (FTS5): This looks for exact words. If you ask for "Project Alpha," it finds the file labeled "Project Alpha" immediately, even if the "vibe" search got confused by similar-sounding projects.

MemX combines these two legs using a technique called Reciprocal Rank Fusion (RRF). Think of it like a referee in a race who takes the votes from both the "Vibe" judge and the "Keyword" judge to decide who really won. This ensures Alex doesn't miss a clue just because he was looking at it from the wrong angle.

3. The "Bouncer" at the Door: Stopping Hallucinations

One of the biggest problems with AI is that it often tries to answer even when it has no idea what you're talking about, leading to "hallucinations" (confident lies).

MemX has a strict Bouncer at the door.

If the detective finds a memory that is a weak match (low confidence), the Bouncer says, "Nope, we don't have that."
Instead of making up an answer, the system admits, "I don't know."
This is a feature, not a bug. It's better for an assistant to say "I don't know" than to lie to you.

4. The "Re-Ranking" Team: Sorting the Best Memories

Once the detective finds a list of potential memories, MemX doesn't just show them in random order. It runs them through a sorting algorithm based on four factors:

How similar is it? (Does it answer the question?)
How recent is it? (Did we talk about this yesterday or five years ago?)
How often is it used? (Is this a favorite memory or a dusty file?)
How important is it? (Did you mark this as "Critical"?)

Crucial Detail: MemX tracks "Retrieval" (when the AI found the memory to answer a question) separately from "Access" (when you just looked at the file).

Analogy: Imagine you have a book you read once a year for fun (Access) vs. a manual you use every day to fix your car (Retrieval). MemX knows the manual is more important for answering questions, so it puts the manual on top, even if you haven't touched the manual in a few days.

5. The "Granularity" Lesson: Breaking Things Down

The researchers tested MemX with a massive amount of data (over 200,000 records). They found a surprising secret: How you chop up your memories matters more than how you search for them.

Bad Way: Storing whole conversations as one giant block (like storing a whole movie as one file).
Good Way: Breaking conversations down into tiny, atomic facts (like storing every single scene or line of dialogue as its own file).

When MemX used the "Good Way" (Fact-level), it became twice as good at finding the right answer. It's like trying to find a specific sentence in a book: it's much easier if you have an index of every sentence, rather than just an index of chapters.

6. The Speed Boost: The "Library Card"

Finally, the paper highlights a technical trick that made the system incredibly fast.

The Old Way: Searching for a word in a text file is like reading every single page of a library book to find one word. It gets slow as the library grows.
The New Way (FTS5): MemX uses a special "Library Card" system (Full-Text Search Index). It's like having a pre-made list of every word and exactly where it is.
The Result: This made the search 1,100 times faster when the memory bank got huge. It went from taking 3 seconds to find an answer to taking less than a blink of an eye.

Summary

MemX is a system that gives your AI assistant a private, fast, and honest memory.

It keeps your data on your own computer.
It uses two different search methods to find the truth.
It has a "Bouncer" to stop it from lying when it doesn't know the answer.
It sorts memories by what's actually useful, not just what you looked at recently.
And it breaks big conversations into tiny facts to make finding answers super easy.

It's the difference between an assistant who forgets your name every 5 minutes and one who remembers your entire life story, organized perfectly, ready to help you whenever you need it.

1. Problem Statement

Large Language Models (LLMs) are inherently stateless across sessions, making it difficult for AI assistants to retain user preferences, project conventions, or incident resolutions over time. While Retrieval-Augmented Generation (RAG) is the standard for grounding LLMs in external knowledge, most existing systems are designed for cloud-hosted document corpora rather than incremental, personalized memories accumulated during daily interactions.

Furthermore, existing memory systems often prioritize end-to-end agent task completion over retrieval quality in isolation and assume centralized cloud infrastructure. This leaves a gap for local-first deployments (where users own their data and operate offline) that require a memory system capable of:

Balancing high recall with the suppression of spurious results (hallucinations) when no relevant memory exists.
Maintaining structural simplicity and explainability.
Operating with low latency and high privacy on local hardware.

2. Methodology: MemX System Design

MemX is a local-first memory system implemented in Rust using libSQL (a SQLite fork) and an OpenAI-compatible embedding API. Its core design philosophy prioritizes stability over maximum recall.

Core Architecture

The system employs a deterministic, hybrid retrieval pipeline (Figure 1):

Dual Recall Paths:
- Vector Recall: Uses dense vector search (DiskANN via libSQL) to capture semantic similarity.
- Keyword Recall: Uses FTS5 (Full-Text Search 5) for exact term matching.
Fusion: The two candidate sets are merged using Reciprocal Rank Fusion (RRF) ( $k=60$ ).
Four-Factor Re-ranking: Candidates are re-scored using a weighted composite function:
- $Score = \alpha_s \cdot f_{sem} + \alpha_r \cdot f_{rec} + \alpha_f \cdot f_{freq} + \alpha_i \cdot f_{imp}$
- Semantic Similarity: From RRF score.
- Recency: Exponential decay based on last_retrieved_at (not just last_accessed_at).
- Frequency: Log-normalized count of successful retrievals.
- Importance: Explicit user/system annotation.
- Note: Scores are normalized via Z-score and sigmoid transformation to ensure comparability.
Low-Confidence Rejection Rule: A critical stability mechanism. If the keyword set is empty AND the maximum vector similarity is below a threshold ( $\tau = 0.50$ ), the system returns an empty set rather than forcing a low-confidence match.
Deduplication: Applies content deduplication and tag-signature deduplication (collapsing results with identical type+tag combinations) to prevent topic clustering.

Data Model

Memories Table: Stores content, embeddings, metadata, and distinct counters for access (viewing) vs. retrieval (search results).
Links Table: Supports directed relations (e.g., contradicts, caused_by) for future graph-enhanced retrieval, though multi-hop traversal is not yet active in the search pipeline.

3. Key Contributions

Local-First Implementation: A complete, reproducible Rust-based system using libSQL that separates access tracking from retrieval tracking to prevent administrative noise from skewing rankings.
Stability-Oriented Pipeline: Introduces a Low-Confidence Rejection Rule specifically designed to suppress false positives in local scenarios where no answer exists, trading a small amount of recall for significantly higher precision on unanswerable queries.
Reproducible Benchmark Framework: A standalone framework that invokes internal search functions directly (bypassing HTTP) using live embeddings, supporting threshold sweeping and structured JSON reporting.
Granularity & Ablation Analysis: Empirical evidence demonstrating that fact-level chunking (atomic statements) significantly outperforms session-level storage, and that specific pipeline components (like deduplication) have data-dependent effects.

4. Experimental Results

Benchmark Setup

Custom Scenarios: Two Chinese-language suites (43 queries, up to 1,014 records) testing default usage and high-confusion semantic overlaps.
LongMemEval: A large-scale benchmark (500 queries, up to 220,349 records) testing four ability types: Information Extraction, Knowledge Update, Multi-session Reasoning, and Temporal Reasoning.

Key Findings

Retrieval Quality:
- Custom Scenarios: Achieved Hit@1 = 91.3% (Default) and 100% (High-Confusion) with conservative miss suppression.
- LongMemEval: Fact-level granularity doubled performance compared to session-level storage (Hit@5 = 51.6%, MRR = 0.380 vs. 24.6% and 0.183).
- Ability Types: Knowledge Update benefited most from fact-level storage (+44.8 pp Hit@5). Temporal and Multi-session reasoning remained challenging (≤43.6% Hit@5), indicating a need for future temporal indexing.
Latency & Scalability:
- Replacing naive LIKE searches with FTS5 indexing reduced keyword search latency by 1,100× at 100k records (from ~3.3s to ~3ms).
- End-to-end search latency remained under 90 ms even at 220k records.
Ablation Insights:
- Rejection Rule: The low-confidence rule was the sole contributor to suppressing false positives (raising Miss-Empty-Rate from 0% to 66.7% on miss queries) without hurting valid recall.
- Deduplication: Beneficial for template-generated data with tags but harmful for tag-free atomic facts (reducing Hit@5 by 5.0 pp in the LongMemEval fact dataset).
- Access vs. Retrieval: Separating access counts from retrieval counts prevents "admin-heavy" views from inflating the ranking of memories that are rarely actually useful in search.

5. Significance and Future Directions

MemX establishes a solid v1 baseline for local AI assistants, proving that a structurally simple, explainable system can achieve stable, high-quality retrieval without cloud dependency.

Practical Impact: It demonstrates that local-first AI assistants can be both private and performant, provided that retrieval pipelines are optimized for stability (rejecting bad answers) rather than just recall.
Limitations: The system currently struggles with multi-topic queries (coverage gaps) and complex temporal/multi-session reasoning.
Future Work: The authors propose extending the system with temporal indexing, cross-session linking, cross-encoder re-scoring for better rejection, and adaptive deduplication strategies that do not rely on explicit tags.

In summary, MemX shifts the focus from "maximum recall at all costs" to "reliable, stable, and explainable memory," offering a reproducible foundation for the next generation of personal, local AI agents.