MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems

MARS is an efficient, adaptive co-scheduling system that globally coordinates heterogeneous GPU-CPU resources for autonomous LLM agents by decoupling admission from execution and optimizing state retention, thereby significantly reducing end-to-end latency while maintaining high throughput.

Original authors: Yifei Wang, Hancheng Ye, Yechen Xu, Cong Guo, Chiyue Wei, Qinsi Wang, Dongting Li, Tingjun Chen, Hai "Helen" Li, Danyang Zhuo, Yiran Chen

Published 2026-05-01
📖 5 min read🧠 Deep dive

Original authors: Yifei Wang, Hancheng Ye, Yechen Xu, Cong Guo, Chiyue Wei, Qinsi Wang, Dongting Li, Tingjun Chen, Hai "Helen" Li, Danyang Zhuo, Yiran Chen

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you run a busy, high-tech kitchen. In the past, this kitchen only cooked simple, one-off dishes (like a single text response). The chefs (GPUs) were the only ones who mattered, and the goal was simply to cook as many dishes per hour as possible.

But today, the kitchen has changed. It now runs autonomous agents—think of them as super-chefs who don't just cook a dish, but go on a complex journey to solve a problem. They might cook a bit, then run to the pantry to grab an ingredient (using a tool), check a recipe book (reading a file), come back to the stove, cook a little more, and repeat this loop dozens of times until the job is done.

The paper introduces MARS, a new "Kitchen Manager" designed specifically for this chaotic, multi-step style of cooking. Here is how it works, using simple analogies:

The Problem: The Old Kitchen Was Breaking

The old kitchen managers (existing AI systems) were designed for the "one-off dish" era. When faced with these new, complex agent journeys, they made three big mistakes:

  1. The "Stop-and-Go" Trap: When a chef stops cooking to run to the pantry (CPU tool execution), the old manager would either throw away the chef's notes (KV cache) to save space or keep them forever, clogging the counter. If the notes were thrown away, the chef had to rewrite the whole recipe from scratch when they returned, wasting huge amounts of time.
  2. The "Big Order" Blockage: If a huge, complex order (a long context) arrived, it would take over the stove and block all the small, quick orders behind it, even if those small orders were ready to go.
  3. The "Blind Spot": The manager only watched the stove (GPU). They didn't see that the pantry (CPU) was jammed. So, they kept sending chefs to the stove even though the pantry was full, causing a traffic jam where chefs stood idle waiting for ingredients.

The Solution: MARS (The Smart Kitchen Manager)

MARS fixes this by acting as a central nervous system that sees everything—both the stove and the pantry—and coordinates them perfectly.

1. The "Unified Information Stream" (The Walkie-Talkie)

Instead of guessing, MARS has a live feed connecting the stove and the pantry.

  • How it works: Every time a chef pauses to go to the pantry, or returns with an ingredient, a signal is sent instantly. The manager knows exactly why the chef stopped and when they will be back.
  • The Benefit: The manager never has to guess. They know exactly how much counter space (memory) to keep for the chef's notes and when to clear space for new orders.

2. The "External Control Plane" (The Bouncer)

Before a new order even enters the kitchen, MARS checks if the kitchen can actually handle it.

  • How it works: It looks at two things: Is the stove free? Is the pantry free? If the pantry is jammed with tool tasks, MARS says, "No new orders right now," even if the stove is empty. This prevents the kitchen from getting overwhelmed.
  • The Benefit: It stops the kitchen from accepting too many orders at once, which would cause a total collapse where nothing gets finished.

3. The "Internal Agent-Centric Scheduler" (The Smart Dispatcher)

Once orders are inside, MARS decides who cooks next. It doesn't just follow a "First Come, First Served" line.

  • How it works:
    • Priority: If a chef is in the middle of a quick, interactive step, MARS lets them go first. If a chef is stuck on a massive, slow recipe, MARS might pause them briefly to let others finish, preventing the "big order" blockage.
    • The "Warm Resume" Trick: MARS decides intelligently whether to keep a chef's notes (KV cache) on the counter. If the chef is coming back in 2 seconds, MARS keeps the notes (saving time). If they are gone for 20 minutes, MARS clears the notes to make room for others. It only keeps the notes when it actually saves time.

The Results: Faster and Smoother

The paper tested MARS in real-world scenarios (like a coding assistant that writes and tests software).

  • Speed: MARS made the agents finish their tasks up to 5.94 times faster than the old systems in controlled tests. In a real-world coding setup, it sped up task completion by 1.87 times.
  • Reliability: While old systems would get so clogged that they stopped finishing tasks entirely (high "throughput" but zero "goodput"), MARS kept the kitchen running smoothly, ensuring that tasks actually got completed on time.

Summary

Think of MARS as the difference between a kitchen manager who just counts how many pots are on the stove versus one who understands the entire flow of the restaurant. By watching the stove and the pantry simultaneously, and making smart decisions about who cooks next and what notes to keep, MARS ensures that complex, multi-step AI agents can work efficiently without getting stuck in traffic jams.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →