CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems

CacheSolidarity is a lightweight system that secures multi-tenant LLM serving against Automatic Prefix Caching side-channel attacks by selectively isolating suspicious cache reuse, thereby achieving significantly higher cache efficiency and lower latency compared to existing all-or-nothing isolation defenses.

Panagiotis Georgios Pennas, Konstantinos Papaioannou, Marco Guarnieri, Thaleia Dimitra Doudali

Published Thu, 12 Ma
📖 5 min read🧠 Deep dive

Here is an explanation of the paper CacheSolidarity using simple language and everyday analogies.

The Problem: The "Fast Lane" That Leaks Secrets

Imagine a massive, high-speed library (the LLM Serving System) where people come to ask questions and get answers. To make things super fast, the librarians use a trick called Automatic Prefix Caching (APC).

Think of this like a reusable template.

  • If User A asks, "Write a story about a dragon named Sparky," the librarian writes the first half of the story ("Once upon a time, there was a dragon...") and saves it in a "Fast Lane" cache.
  • If User B comes along and asks, "Write a story about a dragon named Rex," the librarian sees the start is the same. Instead of rewriting the whole intro, they just grab the saved "Fast Lane" template and only write the part about "Rex."

This is great for speed. It saves time and energy.

But here is the security flaw:
Because the "Fast Lane" is so much quicker than writing from scratch, the time it takes to get the first word of the answer tells you something.

  • Fast response: "Oh, you used the template! You must have asked about a dragon."
  • Slow response: "You didn't use the template. You asked about something totally new."

A sneaky attacker (the Bad Guy) can exploit this. They can guess secrets by asking questions like, "Is the dragon's name Sparky?" and "Is the dragon's name Rex?"

  • If the answer comes back fast, the Bad Guy knows, "Aha! The victim's prompt actually said 'Sparky'!"
  • By guessing word-by-word and watching the clock, the Bad Guy can steal the victim's private secrets (like names, passwords, or medical conditions) without ever seeing the actual data.

The Old Solutions: The "Sledgehammer" Approach

Previously, to stop this, security experts used a "sledgehammer" approach. They said: "This is too dangerous. Let's just ban the Fast Lane entirely for everyone."

  • The Result: No one shares templates anymore. Everyone has to write from scratch.
  • The Downside: The library becomes incredibly slow. Honest, innocent users suffer because the system is too cautious. It's like banning all cars from a highway because one person might speed.

The New Solution: CacheSolidarity

The authors of this paper built CacheSolidarity. Instead of banning the Fast Lane for everyone, they built a smart security guard that only stops the Bad Guy when they try to cheat.

Here is how it works, using a Hotel Analogy:

1. The "Owner" Badge

When a user (User A) creates a new template (a "prefix"), the system puts a digital Owner Badge on it.

  • Analogy: Imagine User A writes a recipe and puts a name tag on the card: "Created by Alice."

2. The "Suspicious" Flag

If User B (a stranger) tries to use Alice's recipe, the system checks the badge.

  • Scenario A (Benign): User B is just using the same recipe Alice wrote. The system says, "Okay, you can use the first part, but since you aren't Alice, we stop sharing the secret ingredients right here."
  • Scenario B (The Attack): User B tries to guess a secret word in the recipe. The system sees User B is trying to reuse a part of the recipe that belongs to someone else. It immediately slaps a "SUSPICIOUS" Flag on that specific part of the recipe.

3. The "Selective Isolation"

Once a part is flagged as suspicious:

  • For the Owner (Alice): She can still use her own recipe perfectly. No slowdown.
  • For the Stranger (User B): The system says, "Stop! You can't use the shared part anymore. You have to write your own version from scratch."

The Magic: The system only isolates the specific suspicious part of the conversation. It doesn't lock out the whole user. It's like a bouncer who only kicks out the person trying to sneak into the VIP section, while letting everyone else enjoy the party.

4. The "Smart Switch" (The Activator)

The paper also noticed something important: The "Fast Lane" timing leak only works when the library is quiet. If the library is super busy (high traffic), the noise of the crowd masks the difference between a fast and slow response.

So, CacheSolidarity has a Smart Switch:

  • Busy Time: The system knows the timing leak is hidden by the noise. It turns off the strict security checks to keep things running super fast.
  • Quiet Time: The system knows the timing leak is visible. It turns the security checks ON to catch the Bad Guy.

Why This Matters

  • For the Good Guys: They get the speed of the "Fast Lane" (caching) most of the time. They don't suffer from the slow "Sledgehammer" approach.
  • For the Bad Guys: They can't steal secrets anymore because the system stops them the moment they try to guess a shared secret.
  • For the System: It's lightweight. It doesn't need to read the content of the messages to know if they are dangerous; it just looks at who is using whose template.

Summary in One Sentence

CacheSolidarity is like a smart librarian who lets everyone share a common starting point to save time, but instantly locks the door if a stranger tries to peek at a secret part of someone else's story, all while keeping the library running at full speed.