SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

SUN (Shared Use of Next-token Prediction) is a novel approach for efficient multi-LLM disaggregated serving that decomposes Transformer models into task-specific prefill modules and a shared, frozen decode module, thereby enabling cross-model batching to significantly improve GPU utilization and throughput while maintaining accuracy.

Sunghyeon Woo, Ahreum Seo, Jaegwang Lee, Jaeeun Kil, Hanbae Seo, Joonghoon Kim, Baeseong Park, Se Jung Kwon, Dongsoo Lee

Published 2026-03-04
📖 5 min read🧠 Deep dive

Imagine you run a massive, high-tech bakery that specializes in baking thousands of different types of custom cakes. In the world of Artificial Intelligence, these "cakes" are Large Language Models (LLMs) designed for specific jobs: one is a math genius, another is a coding wizard, and a third is a legal expert.

The Problem: The "Specialized Kitchen" Bottleneck

Currently, most bakeries operate like this:

  • The Prep Station (Prefill): This is where the chef reads the customer's order (the prompt) and gathers all the ingredients. It's fast and requires a lot of chopping (computing power).
  • The Baking Station (Decode): This is where the cake is actually baked, one layer at a time. It's slow, requires constant attention, and uses a lot of oven space (memory).

In the old way of doing things (called Disaggregated Serving), the bakery separates these stations to avoid chaos. But here's the catch: Every single cake type gets its own dedicated Baking Station.

If you have 100 orders for the "Math Cake" and only 2 orders for the "Legal Cake," the Math Baking Station is screaming for help, while the Legal Baking Station sits empty, doing nothing. The oven is underutilized, the electricity bill is huge, and customers waiting for the Math Cake are stuck in a long line. This is the Inter-Model Isolation problem.

The Solution: SUN (Shared Use of Next-token Prediction)

The researchers at NAVER Cloud proposed a brilliant new way to run the bakery called SUN.

Think of SUN as realizing that baking a cake is actually the same process, no matter what flavor it is. Whether you are baking a chocolate cake or a vanilla cake, the oven, the temperature, and the timing are identical. The only difference is the ingredients you put in at the start.

SUN splits the process into two distinct steps:

  1. The Custom Prep (Prefill Module): This part is unique for every cake. The Math Cake needs a special math-flavored batter; the Legal Cake needs a law-flavored batter. In SUN, we keep these prep stations separate and customized.
  2. The Universal Oven (Decode Module): This is the magic. SUN freezes the "baking instructions" into a single, universal module. Once the custom batter is ready, every single cake goes into the same shared oven.

How It Works in Real Life

Imagine a busy kitchen with 4 ovens.

  • Before (Old Way): You assign one oven to Math, one to Coding, one to Law, and one to Writing. If the Math orders spike, that oven is overwhelmed. The Law oven sits cold.
  • After (SUN): You have 4 ovens, but they are all shared.
    • The Math Prep station finishes its batter and shouts, "Ready for baking!"
    • The Coding Prep station finishes its batter and shouts, "Ready!"
    • The Law Prep station finishes its batter and shouts, "Ready!"
    • The Scheduler: Instead of sending the Math batter to the "Math Oven," it sends it to the first available oven. The Law batter goes to the next available oven.

This means you can shut down 2 of your ovens and still bake everything just as fast because the remaining ovens are working at 100% capacity, switching between Math and Law cakes instantly.

The Secret Sauce: "Prefill-Only Tuning"

You might ask: "If I use the same oven for everything, won't the Math cake taste like a Law cake?"

That was the big fear. If you just mix the ingredients, the result is garbage. The researchers solved this with a clever trick called Prefill-Only Tuning.

They trained the Prep Stations (the custom batters) to be perfectly compatible with the Universal Oven. They didn't change the oven; they just taught the prep chefs how to mix their ingredients so that the universal oven knows exactly how to bake them.

  • Result: The Math cake tastes exactly like a Math cake, and the Law cake tastes like a Law cake, but they all come out of the same oven.

The Bonus: QSUN (The Energy-Saving Mode)

The researchers also realized that the Universal Oven is the most expensive part to run. So, they created QSUN.

Think of this as putting the oven on "Eco Mode." They lowered the precision of the oven's temperature sensors (quantization). Usually, this makes the cake taste bad. But because they re-trained the Prep Stations (the batters) to work perfectly with this "Eco Oven," the cakes still taste amazing!

  • Benefit: The oven runs 45% faster and uses less energy, without sacrificing quality.

Why This Matters

  • Cheaper: You need fewer GPUs (ovens) to serve the same number of people.
  • Faster: No more waiting in line for a specific oven. If the "Math" oven is busy, your "Law" request jumps the queue to an empty "Math" oven.
  • Smarter: It handles "skewed" workloads perfectly. If everyone suddenly wants Math cakes, the system automatically shifts all the empty ovens to handle the Math rush, rather than letting the Law ovens sit idle.

In summary: SUN stops us from building a separate kitchen for every single type of AI model. Instead, it builds one super-efficient, shared kitchen where the "baking" part is universal, and only the "preparation" is custom. This saves money, saves energy, and gets your AI answers faster.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →