Idleness is Relative: Exploiting Tool-Call Idle Windows for Offloading in Agentic Systems with MORI

MORI is an agent serving system that improves throughput and reduces time-to-first-token by treating idleness as a continuous spectrum to dynamically rank and partition agentic workloads between GPU and CPU memory tiers, thereby overcoming the limitations of traditional eviction policies in handling variable tool-call durations.

Original authors: Tian Xia, Hanchen Li, Zhifei Li, Xiaokun Chen, Hao Kang, Yifan Qiao, Yi Xu, Ion Stoica

Published 2026-06-02
📖 4 min read☕ Coffee break read

Original authors: Tian Xia, Hanchen Li, Zhifei Li, Xiaokun Chen, Hao Kang, Yifan Qiao, Yi Xu, Ion Stoica

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you run a busy restaurant kitchen (the GPU) with a very small, high-speed counter space where chefs work. You also have a large, slower pantry (the CPU) nearby.

In the old days, restaurants only served simple, one-off orders. But now, customers are ordering complex, multi-course "agent" meals. These meals involve the chef cooking a bit, then stepping away to wait for an oven to finish baking, or waiting for a delivery driver to drop off ingredients, then coming back to cook the next step.

The problem is that the chef's workspace (the KV Cache) gets cluttered with ingredients and notes for every step of the meal. If the kitchen gets too crowded, you have to throw ingredients away to make room for new orders. If you throw them away, you have to start from scratch when the chef returns, which wastes time.

The Problem: The "One-Size-Fits-All" Mistake

Previous systems tried to manage this kitchen with a simple rule: "If you haven't cooked in a while, move your ingredients to the pantry."

But this rule is flawed because it doesn't understand the type of waiting:

  1. The Busy Wait: The chef is waiting for a pot to boil for 30 seconds. They will be back at the stove immediately. If you move their ingredients to the pantry, you waste time running back and forth.
  2. The Long Wait: The chef is waiting for a delivery truck that might take 10 minutes. They won't be back soon. Keeping their ingredients on the crowded counter just blocks space for other chefs.

Old systems treated all waiting the same. They would sometimes kick out the chef who was about to return (wasting time reloading ingredients) and keep the chef who was gone for hours (wasting counter space).

The Solution: MORI (The Smart Kitchen Manager)

The paper introduces MORI, a new system that acts like a smart kitchen manager. Instead of just looking at when a chef last worked, MORI looks at how idle they are relative to everyone else.

Think of "idleness" not as a switch (On/Off), but as a dimmer switch.

  • Low Idleness (Bright): The chef is busy, cooking fast, and will be back at the stove in seconds. MORI keeps them on the main counter (GPU).
  • High Idleness (Dim): The chef is stuck waiting for a long time. MORI moves them to the pantry (CPU) to free up the counter.

How MORI Works (The Three Zones)

MORI organizes the kitchen into three zones based on how "busy" the chefs are:

  1. The Hot Counter (GPU HBM): This is for the busiest chefs. Their ingredients are right there, ready to go.
  2. The Pantry (CPU DRAM): This is for chefs who are waiting for a long time. Their ingredients are stored here. It takes a little time to bring them back to the counter, but it's worth it because the counter is free for others.
  3. The Waiting Room: If the pantry is full, some chefs have to wait outside. If they get called back, they have to start their recipe from scratch because their ingredients were thrown away.

The Magic Trick:
MORI constantly checks the "dimmer switch" for every chef.

  • If the counter is full, it kicks out the chef who is most idle (the one waiting the longest) to the pantry.
  • If a spot opens up on the counter, it brings back the chef who is least idle (the one about to start cooking).
  • It also makes sure that if a chef was in the pantry, they go back to the same kitchen station they came from, so they don't have to run to a different part of the building to find their stuff.

The Results

The researchers tested this system using real-world coding agents (like AI programmers) on powerful computers. They found that MORI was much better than the old methods:

  • Faster Service: The kitchen could handle 20% to 71% more orders per hour.
  • Less Waiting: Customers got their first bite of food 18% to 43% faster.
  • No Waste: It stopped the kitchen from wasting time reloading ingredients for chefs who were actually about to return.

In a Nutshell

MORI is a smart scheduler that realizes "waiting" isn't all the same. By treating idleness as a spectrum (a sliding scale) rather than a simple yes/no, it keeps the fast, busy work on the super-fast hardware and moves the long-waiting work to the slower, cheaper storage. This keeps the system running smoothly without wasting expensive resources or time.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →