KohakuRAG: A simple RAG framework with hierarchical document indexing

KohakuRAG is an open-source, hierarchical RAG framework that achieves state-of-the-art performance on the WattBot 2025 Challenge by preserving document structure through a four-level tree representation, enhancing retrieval via LLM-powered query planning, and stabilizing outputs with ensemble voting, thereby outperforming existing methods in precision and citation accuracy.

Shih-Ying Yeh, Yueh-Feng Ku, Ko-Wei Huang, Buu-Khang Tu

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you are a brilliant but slightly forgetful librarian named LLM (Large Language Model). You know a lot of facts, but you often make things up (hallucinate) or forget details from your training data. To fix this, you are given a massive library of 32 thick technical manuals about AI energy consumption. Your job is to answer 300 specific questions about them, citing exactly which page you found the answer on.

The catch? The questions are tricky. They might use different words than the books (e.g., asking for "PUE" when the book says "Power Usage Effectiveness"), and if you can't find the answer, you must admit you don't know rather than guessing.

The team Kohaku-Lab built a system called KohakuRAG to help you do this perfectly. They won first place in a competition by solving three main problems using some clever tricks. Here is how they did it, explained with simple analogies:

1. The Problem: The "Flat Pile" vs. The "Tree House"

The Old Way: Most systems take a book, chop it into random, flat piles of paper (chunks), and throw them in a box. If you ask a question, the system grabs a few random pages.

  • The Issue: You lose the story. You might grab a sentence about "solar panels" without the paragraph explaining why they are efficient. Also, if you need to cite the source, you might point to a random page number that doesn't make sense.

The KohakuRAG Solution: The "Tree House" Index
Instead of a flat pile, they organized the library like a Tree House.

  • The Structure: The whole book is the trunk. Chapters are branches. Paragraphs are rooms. Sentences are the furniture.
  • The Magic: They built "elevators" (embeddings) that go from the bottom (furniture) up to the top (trunk). If you ask about a specific chair (sentence), the system automatically knows which room (paragraph) and which floor (chapter) it belongs to.
  • Why it helps: When you find the answer, you know exactly where it lives in the building. You can point to the specific room, not just a random floor.

2. The Problem: The "Lost in Translation" Search

The Old Way: You ask the librarian, "How much energy does a Google data center use?" The librarian looks for that exact phrase. If the book says "Power Usage Effectiveness of Google's cloud facilities," the librarian says, "I can't find it!" because the words don't match.

The KohakuRAG Solution: The "Detective Squad"
Instead of sending one librarian to search, they send a Squad of Detectives.

  • The Planner: Before searching, a smart AI (the Planner) takes your question and sends out 4 different detectives.
    • Detective A asks: "Google data center energy."
    • Detective B asks: "Google PUE metrics."
    • Detective C asks: "How efficient is Google's cloud?"
    • Detective D asks: "Google sustainability report."
  • The Reranking: All detectives bring back piles of papers. The system then looks at the piles. If three detectives found the same page, that page is probably the right one! It's like a popularity vote. This ensures the system finds the answer even if you used the wrong words.

3. The Problem: The "Wobbly Answer"

The Old Way: You ask the librarian a question. Sometimes they give you the right answer. Sometimes, because they are a bit nervous or the lighting is bad, they give a slightly different answer, or they say "I don't know" even when the answer is right there.

The KohakuRAG Solution: The "Panel of Judges"
Instead of asking one librarian, they ask 9 different librarians (or the same librarian 9 times with a slight twist).

  • The Vote: They write down their answers.
  • The "Blank" Filter: Sometimes a librarian gets scared and writes "I don't know" (abstention) even if they saw the answer. The system is smart enough to say, "Hey, 8 other people found the answer, so we'll ignore that one scared librarian."
  • The Majority Rule: The system takes the answer that most people agreed on. This makes the final answer very stable and reliable.

The Secret Sauce: "Don't Put the Question at the End"

The researchers discovered something funny about how AI reads. If you give the AI a long list of documents and then ask the question at the very end, the AI gets confused and forgets the beginning (like reading a long email and forgetting the first sentence).

  • The Fix: They put the Documents first, and the Question last. It's like reading the menu before ordering, rather than ordering and then reading the menu. This simple change improved their score by a huge amount (80% relative improvement!).

The Result

By building a Tree House index, sending a Detective Squad to search, and using a Panel of Judges to vote, KohakuRAG became the only team to stay in 1st Place on both the public and private leaderboards.

They proved that you don't need to be the biggest, most expensive AI to win; you just need a smart way to organize your library, ask the right questions, and double-check your work.