Quantifying the Accuracy and Cost Impact of Design Decisions in Budget-Constrained Agentic LLM Search

This paper presents a controlled study using the Budget-Constrained Agentic Search (BCAS) framework to quantify how search depth, retrieval strategies, and completion budgets impact the accuracy and cost of Agentic RAG systems across six LLMs and three benchmarks, offering practical configuration guidelines for budget-constrained deployments.

Kyle McCleary, James Ghawaly

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine you are hiring a research assistant (an AI) to answer a difficult question for you. You have two main constraints:

  1. Time/Money: You can only ask them to look up information a few times (Search Budget).
  2. Paper Length: You can only let them write a certain number of words before you stop them (Token Budget).

This paper is like a scientific taste test to figure out: "How do we get the best answer from our assistant without blowing our budget?"

The researchers built a testing ground called BCAS (Budget-Constrained Agentic Search). They didn't just ask the AI to "do its best." Instead, they forced it to work within strict limits, like a student with a limited number of library visits and a strict word count for their essay.

Here is what they discovered, explained through simple analogies:

1. The "Three-Visit" Rule (Search Depth)

The Finding: Accuracy goes up when you let the AI search more, but only up to a point. After about three searches, the extra effort barely helps.

  • The Analogy: Imagine you are trying to find a specific recipe in a giant library.
    • 1 Search: You ask the librarian once. You might get the right book, or you might get a cookbook from the wrong decade.
    • 2-3 Searches: You ask again, maybe refining your question. Now you have a few good options. Your chances of finding the perfect recipe skyrocket.
    • 10 Searches: You keep asking. You now have 50 cookbooks. But you're just re-reading the same recipes. You aren't getting a better answer; you're just getting more of the same. You've hit the "diminishing returns" wall.

Takeaway: Don't let the AI search forever. Give it 3 chances to look things up, then let it write the answer.

2. The "Smart Librarian" vs. The "Keyword Matcher" (Retrieval Strategy)

The Finding: The best way to find information isn't just matching keywords (like a basic Google search) or just matching "vibes" (AI understanding). It's a hybrid approach, followed by a re-ranker (a second opinion).

  • The Analogy:
    • Basic Search (BM25): Like a librarian who only finds books with the exact words you typed. If you ask for "fast cars," they ignore a book titled "Speeding Vehicles."
    • AI Search (Dense): Like a librarian who understands concepts. They find "Speeding Vehicles" because they know it's about fast cars.
    • The Hybrid + Re-ranker: This is the Super-Librarian. They use both methods to get a huge list of candidates, then a senior editor (the re-ranker) looks at that list and says, "Actually, this book is the most useful one. Put that at the top."
    • Result: This combination gave the biggest boost in accuracy, especially for tricky questions.

3. The "Essay Length" Trap (Token Budget)

The Finding: Giving the AI more space to write (more tokens) only helps if the question requires complex thinking (like combining facts from three different books). For simple questions, more writing space just makes the AI ramble.

  • The Analogy:
    • Simple Question: "Who was the first president?"
      • If you give the AI a 1-page limit or a 10-page limit, the answer is still just "George Washington." Giving it more paper doesn't make the answer better; it just makes the AI write a long, boring biography when you only wanted a name.
    • Complex Question: "How did the invention of the printing press influence the spread of democracy in Europe?"
      • Here, the AI needs a lot of space to connect the dots between the printing press, the spread of ideas, and political changes. If you cut the paper short, the AI has to rush and miss the connections.
    • The Twist: Sometimes, giving the AI less writing space forces it to be efficient. It stops rambling and focuses on making more searches instead of writing a long, useless essay.

4. The "Small Dog" vs. The "Smart Dog" (Model Size)

The Finding: Smaller, cheaper AI models can actually beat huge, expensive models if you give them more search opportunities and better tools.

  • The Analogy:
    • The Big Model (Expensive): Like a genius dog that knows a lot of facts already. It can answer well with just one look-up. But it's expensive to feed.
    • The Small Model (Cheap): Like a smart puppy. It doesn't know as much off the top of its head.
    • The Magic: If you give the puppy a map (planning) and let it run around the neighborhood three times (searches) to find the facts, it can catch up to the genius dog. The "cheap" option becomes the "best value" option because you aren't paying for the expensive model's brain; you're paying for the process of finding the answer.

The "Golden Rule" for Budgets

If you are building an AI system and have a limited budget, the paper suggests this order of operations:

  1. First, buy more "Searches": Let the AI look things up 2 or 3 times. This gives the biggest bang for your buck.
  2. Second, upgrade the "Search Engine": Use a smart hybrid search with a re-ranker (the Super-Librarian).
  3. Third, only then, buy more "Writing Space": Only increase the word limit if you are asking very complex questions that require synthesizing lots of information.

Summary

The paper teaches us that being "Agentic" (smart and autonomous) isn't about letting the AI run wild. It's about strategic constraints. By forcing the AI to search a few times, use smart tools, and write concisely, you get better answers for less money. It's the difference between hiring a frantic intern who writes 50 pages of nonsense, and a focused researcher who checks three sources and gives you the perfect answer.