DOPD: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving

DOPD is a dynamic LLM inference system that optimizes goodput and meets strict SLOs by adaptively adjusting the ratio of prefill to decoding instances based on real-time workload monitoring, outperforming existing aggregation and disaggregation approaches like vLLM and DistServe.

Junhan Liao, Minxian Xu, Wanyi Zheng, Yan Wang, Kejiang Ye, Rajkumar Buyya, Chengzhong Xu

Published 2026-03-10
📖 5 min read🧠 Deep dive

The Big Picture: The "Two-Chef" Kitchen Problem

Imagine you run a high-end restaurant (a Large Language Model, or LLM) that serves complex dishes. Every order goes through two distinct stages:

  1. The Prep Stage (Prefill): The chef reads the order, gathers ingredients, and chops everything. This is hard work (compute-intensive) but happens quickly once started.
  2. The Cooking Stage (Decoding): The chef actually cooks the meal, plate by plate, serving it to the customer. This is memory-intensive (you need lots of fridge space for ingredients) and takes a long time.

The Old Way (The Single Chef):
In the past, one chef tried to do both prep and cooking for every order at the same time. If a customer ordered a massive banquet (a long prompt), the chef spent all their time chopping, and the kitchen backed up. If they ordered a tiny snack, the chef wasted time setting up a huge station. It was inefficient.

The "PD-Disaggregation" Way (The Two-Chef System):
To fix this, modern systems split the kitchen into two teams:

  • Team A (Prefill Chefs): Only do the chopping and prep. They are fast and strong.
  • Team B (Decoding Chefs): Only do the cooking and serving. They need lots of fridge space.

This sounds great, but it creates a new problem: The Mismatch.

  • If you have too many Prep Chefs and not enough Cooking Chefs, the Prep Chefs finish their work and sit around waiting for the Cooking Chefs to catch up. Wasted money.
  • If you have too many Cooking Chefs and not enough Prep Chefs, the Cooking Chefs stand around with empty hands, waiting for food to arrive. Wasted money.
  • The Chaos: Customers order wildly different things. Some want a 30-second snack; others want a 3-hour feast. If you set your kitchen staff based on the "average" order, you will always be wrong. Short orders get stuck in a long line, and long orders overwhelm the system.

The Solution: DOPD (The Smart Kitchen Manager)

The authors of this paper created a system called DOPD (Dynamic Optimal Prefill/Decoding). Think of DOPD as a super-smart, predictive Kitchen Manager who never sleeps.

Here is how DOPD solves the problems using three main tricks:

1. The Crystal Ball (Predicting the Future)

Most managers just look at what is happening right now. DOPD looks at what happened recently to guess what will happen next.

  • The Analogy: Imagine a weather forecaster who knows that if it rains at 2 PM, it usually pours at 3 PM. DOPD uses a mathematical tool (called ARIMA) to predict the "weather" of your requests.
  • What it does: It guesses: "In the next few minutes, we will get 50 short orders and 5 long orders."
  • The Result: Instead of waiting for a traffic jam to form, DOPD calls in extra Prep Chefs before the rush starts. This prevents the kitchen from ever getting overwhelmed.

2. The Perfect Ratio (Balancing the Team)

DOPD constantly calculates the Golden Ratio of Prep Chefs to Cooking Chefs.

  • The Analogy: If the menu changes from "Salads" (short) to "Steaks" (long), the ratio of chopping knives to frying pans needs to change.
  • What it does: If the system predicts a rush of long requests, it automatically spins up more Prep Chefs. If it's a rush of short requests, it shifts resources to the Cooking side. It ensures that as soon as a Prep Chef finishes chopping, a Cooking Chef is ready to grab the plate. No one is ever standing around idle.

3. The Smart Waiter (Handling Mixed Orders)

This is the most clever part. Sometimes, you get a mix of tiny snacks and huge feasts.

  • The Analogy: Imagine a waiter who knows that if you put a tiny appetizer in the same batch as a giant turkey, the turkey slows down the appetizer.
  • What it does: DOPD has a special rule for Ultra-Short Requests. If a request is tiny (like a 100-word prompt), the system realizes: "Hey, sending this to the Prep Chef takes longer than just cooking it right here!" So, it skips the Prep Chef entirely and cooks it immediately on the Cooking side.
  • The Result: Tiny orders fly through the system instantly, while big orders get the full attention of the Prep team.

Why Does This Matter? (The Results)

The paper tested DOPD against other systems (like vLLM and DistServe) using real-world data from Microsoft Azure. The results were like upgrading from a bicycle to a sports car:

  • 1.5x Faster: The system produced 50% more "good" answers per hour (Goodput).
  • 67% Faster Start: The time it takes to see the first word of an answer (Time-to-First-Token) dropped by two-thirds.
  • 99% Success Rate: Almost every customer got their answer on time (SLO attainment), whereas other systems failed about 20% of the time.
  • Cheaper: Because the system is so efficient, you need fewer expensive GPUs (computer chips) to do the same amount of work.

Summary

DOPD is like a self-driving kitchen that:

  1. Predicts the rush before it happens.
  2. Adjusts the number of chefs instantly to match the demand.
  3. Sorts orders so tiny ones don't get stuck behind huge ones.

It turns a chaotic, inefficient AI service into a smooth, fast, and cost-effective machine, ensuring that when you ask an AI a question, it answers you quickly and reliably, no matter how busy the server is.