Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints

This paper proposes a robust, batch-level query routing framework that jointly optimizes model assignments and instance allocation under cost and capacity constraints, demonstrating significant improvements in accuracy and throughput over traditional per-query methods, particularly under adversarial batching conditions.

Jelena Markovic-Voronov, Kayhan Behdin, Yuanda Xu, Zhengze Zhou, Zhipeng Wang, Rahul Mazumder

Published 2026-03-31
📖 4 min read☕ Coffee break read

Imagine you run a busy restaurant kitchen (a Large Language Model system) that has to serve thousands of customers (queries) every hour. You have a menu of different chefs (LLMs) available:

  • The "Star Chefs": Highly skilled, can handle complex dishes (hard questions), but they are expensive and slow.
  • The "Line Cooks": Fast, cheap, and great at simple dishes (easy questions), but they might burn a complex meal.

Your goal is to get the best food quality for your customers while staying within your daily budget and not overworking your kitchen staff (GPU resources).

The Old Way: The "One-by-One" Mistake

Previously, most restaurants used a "Per-Query" rule. Every time a customer ordered, a manager looked at the dish and decided: "Is this a simple salad? Give it to the Line Cook. Is it a complex soufflé? Give it to the Star Chef."

The Problem:
Imagine a sudden rush where 50 customers in a row all order the most expensive, complex soufflés.

  • The manager sends them all to the Star Chefs.
  • Result: The kitchen explodes! The Star Chefs are overwhelmed, the bill skyrockets, and the next 50 customers (who ordered simple salads) have to wait because the kitchen is backed up.
  • The old method couldn't see the "big picture" of the whole rush. It only looked at one order at a time.

The New Solution: The "Batch-Level" Manager

This paper proposes a smarter system called Robust Batch-Level Routing. Instead of looking at one order, the manager looks at the entire tray of 100 orders that just came in and plans the whole shift at once.

Here is how it works, broken down into three simple concepts:

1. The Group Plan (Batch-Level Optimization)

Instead of deciding chef assignments one by one, the manager looks at the whole group of 100 orders.

  • The Math: They use a smart calculator (Integer Linear Programming) to solve a puzzle: "How do we split these 100 orders between the Star Chefs and Line Cooks so that everyone gets good food, the total bill stays under $500, and no chef is overwhelmed?"
  • The Benefit: If 20 complex orders come in, the manager might say, "Okay, we'll send 15 to the Star Chefs and 5 to the Line Cooks (who will try their best), and we'll save the budget for the next group." This prevents the kitchen from crashing and keeps the costs steady.

2. The "Safety Net" (Robustness)

Sometimes, the manager's guess about how good a chef is might be wrong. Maybe the "Star Chef" is having a bad day, or the "Line Cook" is actually better than we thought.

  • The Risk: If the manager is too confident and sends a hard dish to a Line Cook who fails, the customer gets a burnt meal.
  • The Fix: The new system uses a "Worst-Case Scenario" approach. It assumes the chefs might perform slightly worse than expected. It plans the schedule based on the lowest likely performance.
  • The Analogy: It's like packing an umbrella even if the weather forecast says "sunny." If it does rain, you're safe. If it doesn't, you just carried a little extra weight. This ensures the system never fails catastrophically, even if the predictions are slightly off.

3. The Kitchen Setup (Offline Instance Allocation)

Before the restaurant even opens for the day, the owner has to decide: "How many Star Chefs and Line Cooks should we hire today?"

  • The Old Way: Hire the same number of chefs every day, regardless of the menu.
  • The New Way: The owner looks at the expected menu for the week. If the week is full of complex dishes, they hire more Star Chefs. If it's mostly simple salads, they hire more Line Cooks.
  • The Benefit: This ensures the kitchen isn't full of expensive chefs standing around doing nothing, or cheap cooks who are too slow for the workload. It matches the resources to the actual demand.

Why Does This Matter?

The paper tested this system on real-world data and found:

  • Better Quality: Customers got better food (higher accuracy) because the right chef was always chosen for the right group of orders.
  • Cheaper: The restaurant didn't overspend on the Star Chefs during simple rushes.
  • More Stable: Even when a "bad batch" of difficult orders arrived (Adversarial Batching), the system didn't crash or go over budget.

In a Nutshell

Think of this paper as teaching a restaurant manager to stop looking at individual orders and start planning the whole shift. By looking at the group, preparing for the worst-case scenario, and hiring the right number of staff beforehand, they serve better food, spend less money, and keep the kitchen running smoothly without chaos.