Skirting Additive Error Barriers for Private Turnstile Streams

This paper demonstrates that the previously established Ω(T1/4)\Omega(T^{1/4}) additive error lower bound for differentially private continual release of distinct elements and F2F_2 moments in turnstile streams can be circumvented by allowing algorithms to output estimates with both polylogarithmic multiplicative and additive errors while using only polylogarithmic space.

Anders Aamand, Justin Y. Chen, Sandeep Silwal

Published 2026-03-05
📖 6 min read🧠 Deep dive

Imagine you are the manager of a busy, high-security airport. Every day, thousands of people (data items) arrive and leave. Your job is to keep a running tally of how many unique travelers are currently in the terminal, or perhaps calculate a "chaos score" based on how often people move around.

However, there's a catch: Privacy. You cannot reveal who is in the terminal. If you change your count just because one person walked in or out, you might accidentally reveal that specific person's presence. This is the world of Differential Privacy.

For a long time, experts believed that to keep this privacy, your count would have to be very "fuzzy." If 1,000 people were in the terminal, your private count might be off by hundreds. It was like trying to count a crowd through a thick fog; you could see the general shape, but the exact number was impossible.

This paper says: "We can clear the fog, but we have to change how we look at the numbers."

Here is the simple breakdown of their discovery, using some creative analogies.

1. The Old Problem: The "Fuzzy" Count

Previously, researchers thought the only way to protect privacy was to add a massive amount of "static" (noise) to your count.

  • The Analogy: Imagine you are trying to count the number of apples in a basket, but every time you look, a giant, invisible hand shakes the basket and adds or removes a random pile of apples.
  • The Result: If you have 100 apples, the noise might be 50. Your count could be anywhere between 50 and 150. This is called Additive Error.
  • The Limit: A recent study showed that for long streams of data, this noise had to be huge (growing with the size of the stream). It seemed impossible to get a precise count without breaking privacy.

2. The New Trick: The "Zoom Lens" (Multiplicative Error)

The authors realized that while we can't get a perfectly precise number, we can get a number that is proportionally correct.

  • The Analogy: Instead of trying to count every single apple perfectly, imagine you have a magic zoom lens.
    • If there are 10 apples, the lens might say "It's between 8 and 12." (Small error).
    • If there are 1,000,000 apples, the lens might say "It's between 900,000 and 1,100,000." (Big absolute error, but small percentage error).
  • The Shift: The paper introduces Multiplicative Error. This means the error scales with the size of the crowd. If the crowd is huge, the error is allowed to be larger in raw numbers, but it remains a small percentage of the total.

By accepting this "percentage-based" fuzziness, they were able to drastically reduce the "static" (noise) to almost nothing.

3. How They Did It: The "Bucket Sort" Strategy

To achieve this, they used two clever tricks, like sorting mail into different bins.

Trick A: The "Least Significant Bit" (MinHash)

Imagine you have a giant room with thousands of people. You want to know how many unique people are there without asking their names.

  • The Method: You ask everyone to flip a coin. If it's heads, they go to Room A. If tails, they go to Room B. Then, within those rooms, you ask them to flip again.
  • The Privacy Hack: Instead of counting people directly (which is risky), you count how many people end up in the smallest room that still has people in it.
  • Why it works: If you have 1,000 people, you expect to find a room with very few people deep in the chain of coin flips. By tracking these "buckets" privately, you can estimate the total crowd size without ever knowing exactly who is where. The paper shows that even with privacy noise, this bucket method gives a surprisingly accurate estimate.

Trick B: The "Shrinking Room" (Domain Reduction)

Imagine you have a huge map of a city, but you only care about the number of unique houses.

  • The Method: You take a giant map and shrink it down to a tiny postcard. Many different houses on the big map will now overlap (collide) on the tiny postcard.
  • The Privacy Hack: You count how many "spots" on the postcard have a house. If the postcard is the right size, the number of occupied spots tells you roughly how many unique houses were on the big map.
  • The Result: This allows them to turn a massive, hard-to-count problem into a small, easy-to-count problem that fits inside a tiny, private memory space.

4. The "Chaos Score" (F2 Moment)

The paper also tackled a harder problem: calculating the "F2 moment."

  • The Analogy: This isn't just counting people; it's calculating how "concentrated" the crowd is. If 1,000 people are all standing in one corner, the chaos score is huge. If they are spread out evenly, the score is lower.
  • The Old View: Privacy experts said, "You can't calculate this privately without a massive error."
  • The New View: By using a mathematical trick called Johnson-Lindenstrauss (which is like projecting a 3D object onto a 2D wall without losing its shape), they could shrink the data down, count the shadows, and get a very accurate "chaos score" with almost no privacy noise.

The Big Takeaway

The Trade-Off:
In the past, we thought we had to choose between Privacy and Accuracy.

  • Old Way: High Privacy = Terrible Accuracy (Huge noise).
  • New Way: High Privacy = Good Accuracy, if we allow the error to be a small percentage of the total.

Why it matters:
This means we can now monitor sensitive data streams (like network traffic, financial transactions, or user activity) in real-time with tiny amounts of memory and very high accuracy, without ever compromising the privacy of the individuals involved. It turns a "fuzzy guess" into a "reliable estimate."

In short: They found a way to see the forest clearly, even through the privacy fog, by realizing that knowing the forest is "roughly 10% bigger than last year" is often good enough, and that knowledge doesn't require revealing the location of every single tree.