MDKeyChunker: Single-Call LLM Enrichment with Rolling… — Plain-Language Explanation

Imagine you are trying to find a specific recipe in a massive, disorganized library.

The Problem with Current Systems (The "Fixed-Size" Approach)
Most current AI search systems (RAG) work like a robot that chops every book in the library into identical, 500-word slices, regardless of what the book is about.

The Issue: If a recipe has a list of ingredients, a paragraph of instructions, and a picture, the robot might cut the picture off from the instructions. It might separate the "Preheat oven" step from the "Mix the batter" step.
The Cost: To understand what each slice is about, the robot has to call a super-smart AI assistant (an LLM) multiple times for every single slice—once to write a title, once to find keywords, once to summarize, etc. This is slow and expensive.
The Confusion: Because each slice is processed alone, the robot might call one slice "Baking Cookies" and a later slice "Cookie Making," not realizing they are the same thing.

The Solution: MDKeyChunker
The paper introduces MDKeyChunker, a smarter way to organize these books. Think of it as a three-step process run by a very organized librarian.

Step 1: The "Smart Scissors" (Structure-Aware Chunking)

Instead of chopping the book into random 500-word pieces, this librarian looks at the structure of the document (like headers, code blocks, tables, and lists).

The Analogy: Imagine the document is a Lego castle. A fixed-size cutter would smash the castle into random piles of bricks, breaking the towers and walls. MDKeyChunker is like a careful builder who only cuts along the natural seams of the castle. If a table is 10 rows long, the whole table stays together. If a code block is 50 lines, it stays intact.
Result: No more broken recipes or split instructions.

Step 2: The "One-Call Super-Interview" (Single-Call Enrichment)

Now, the librarian needs to label these chunks. Usually, you'd interview the AI assistant five different times for five different labels. MDKeyChunker does it in one single interview.

The Analogy: Instead of asking the AI, "What's the title?" then "What are the keywords?" then "What's the summary?" separately, the librarian asks: "Here is a chunk of text. Please give me the title, a summary, keywords, a list of important names, questions it answers, and a specific 'topic tag' all in one go."
The Magic Trick (Rolling Keys): This is the secret sauce. The librarian keeps a rolling notebook (a dictionary) of the "topic tags" used so far.
- If Chunk 1 is about "Admissions," the librarian writes "Admissions" in the notebook.
- When Chunk 5 comes along and talks about the same thing, the AI sees the notebook and says, "Oh, this is still about 'Admissions,' I'll use that same tag instead of inventing a new one like 'Enrollment Process'."
- This prevents the AI from getting confused by synonyms and keeps the whole document connected.

Step 3: The "Puzzle Reassembly" (Key-Based Restructuring)

After labeling, the librarian looks at the "topic tags."

The Analogy: Imagine you have puzzle pieces scattered across the room. Some pieces are far apart in the book, but they both have the tag "Solar System."
The Action: The librarian takes all the pieces with the "Solar System" tag and glues them together into one big, coherent chunk, even if they were originally separated by 30 pages of other text.
Result: You get a "Super-Chunk" that contains all the information about Solar Systems in one place, making it much easier for the search engine to find.

Why Does This Matter?

The paper tested this on 18 documents and 30 questions.

Accuracy: It found the right answers almost perfectly (Recall@5 = 1.000 for some setups).
Efficiency: It cut the number of AI calls in half (or more) by doing everything in one go.
Integrity: It never broke a table or a code block in half.

In Summary:
MDKeyChunker is like upgrading from a machine that blindly chops documents into random bits to a smart librarian who respects the document's natural structure, interviews the content efficiently in one go, and reassembles related ideas into perfect, ready-to-use bundles. It makes AI search faster, cheaper, and much more accurate.

1. Problem Statement

The paper identifies three systematic failure modes in standard Retrieval-Augmented Generation (RAG) pipelines:

Chunk Boundary Fragmentation: Fixed-size splitting (e.g., 256–512 tokens) often ruptures semantic coherence, separating tables from captions, code blocks from explanations, or lists from headers. This degrades retrieval recall and answer quality.
Metadata Extraction Cost: Enriching chunks with metadata (summaries, entities, keywords) typically requires chaining multiple separate LLM calls (one per field), leading to $O(n \cdot m)$ complexity where $n$ is the number of chunks and $m$ is the number of extraction stages. This creates significant latency and cost barriers.
Contextual Isolation: Independent chunk processing leads to synonym proliferation (e.g., "admissions timeline" vs. "application deadlines" for the same topic) because chunks lack inter-chunk context, preventing the system from recognizing related content across the document.

2. Methodology

MDKeyChunker proposes a unified, three-stage pipeline designed specifically for Markdown documents to address these issues:

Stage 1: Structure-Aware Chunking

Instead of fixed token counts, the system parses Markdown to identify atomic semantic units.

Atomic Units: Headers, code blocks (fenced/indented), tables, lists, and blockquotes are treated as indivisible units.
Splitting Logic: The parser splits only at semantic boundaries (e.g., between headers) while preserving the integrity of complex structures.
Constraints: Chunks are bounded by minimum ( $\tau_{min} = 100$ chars) and soft maximum ( $\tau_{max} = 1500$ chars) thresholds, but atomic blocks (like large tables) are never split even if they exceed $\tau_{max}$ .

Stage 2: Single-Call LLM Enrichment with Rolling Keys

This is the core innovation. Instead of multiple calls, a single LLM invocation extracts seven metadata fields simultaneously:

Title, Summary, Keywords, Entities, Questions, Semantic Key, and Related Keys.

Rolling Key Dictionary: The LLM receives a dictionary $K$ $K$ of keys extracted from previous chunks.
- If the current chunk continues a prior topic, the LLM is instructed to reuse the existing key from $K$ .
- If it introduces a new subtopic, a new key is added.
- The dictionary is capped at 40 entries using an LRU (Least Recently Used) eviction policy to manage context window size.
Benefit: This canonicalizes topics (reducing synonym proliferation) and maintains document-level context without complex scoring formulas.

Stage 3: Key-Based Restructuring (Bin-Packing)

Post-enrichment, the pipeline reorganizes chunks to co-locate related content.

Merging: Chunks sharing the same semantic key are merged into a single retrieval unit using a first-fit bin-packing algorithm.
Constraints: Merges are subject to a maximum size limit ( $\tau_{merge} = 3000$ chars).
Orphan Handling: Chunks without keys are preserved but may have section context prepended if they are too small.
Result: Distant fragments discussing the same specific subtopic (e.g., "model types" appearing in two different sections of a document) are merged into a single, coherent retrieval unit.

3. Key Contributions

Single-Call Enrichment Protocol: A novel prompt design that extracts seven distinct metadata fields in one LLM call, reducing inference costs by a factor of $m$ (number of fields) compared to traditional pipelines.
Rolling Key Propagation: A mechanism that maintains topical continuity across chunks using a lightweight dictionary, replacing hand-tuned scoring with LLM-native semantic matching.
Key-Based Restructuring: An algorithm that globally merges semantically related chunks via bin-packing, effectively creating "virtual" sections that transcend the original document layout.
Open-Source Implementation: A fully functional Python library with 76 unit tests, supporting OpenAI-compatible endpoints, and demonstrating zero code-block or table splits in validation.

4. Experimental Results

The system was evaluated on an 18-document Markdown corpus (354 KB) with 30 diverse queries.

Structural Integrity: Zero instances of code-block or table splitting were observed.
Enrichment Efficiency: Achieved a 100% fill rate for all 7 metadata fields. The rolling key mechanism achieved an 89.8% cross-reference rate, confirming that the LLM successfully reused prior keys rather than generating synonyms.
Restructuring Impact: Reduced the total chunk count from 269 to 244 (9.3% reduction) by merging 28 chunks across 13 key groups.
Retrieval Performance:
- Config D (Structure-only + BM25): Achieved perfect Recall@5 = 1.000 and MRR = 0.911.
- Config C (Full Pipeline + Dense Retrieval): Achieved Recall@5 = 0.867 and MRR = 0.744.
- Comparison: While the full dense pipeline (Config C) slightly underperformed the BM25 baseline (Config D) in this specific setup, it significantly outperformed fixed-size dense retrieval (Config A: Recall@5 = 0.933, MRR

MDKeyChunker: Single-Call LLM Enrichment with Rolling Keys and Key-Based Restructuring for High-Accuracy RAG