ERASE -- A Real-World Aligned Benchmark for Unlearning in Recommender Systems

This paper introduces ERASE, a large-scale benchmark designed to align machine unlearning evaluation with real-world recommender system constraints by covering diverse tasks, datasets, and algorithms to systematically analyze the effectiveness and robustness of current unlearning methods.

Pierre Lubitzsch, Maarten de Rijke, Sebastian Schelter

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you have a very smart, personalized recipe book (a Recommender System) that learns what you like to eat based on everything you've ever cooked or ordered. Over time, it gets really good at suggesting meals just for you.

But what happens if you decide to go vegan, or if you realize you accidentally added a toxic ingredient to your history? You want the book to completely forget those specific ingredients and the meals you made with them, as if they never happened. This is the problem of "Machine Unlearning."

The paper introduces ERASE, a new, massive testing ground designed to see which "eraser" tools work best in the real world. Here's the breakdown using simple analogies:

1. The Problem: The "Big Eraser" vs. The "Tiny Eraser"

Previous tests for machine unlearning were like trying to clean a house by knocking down the whole wall just to remove one picture. They assumed you wanted to delete huge chunks of data (like 5% of all your history) all at once.

ERASE realizes that in real life, you don't delete your whole history at once. You might delete one sensitive photo today, block a spammer tomorrow, and remove a bad review next week.

  • The Old Way: "Delete the whole kitchen!"
  • The ERASE Way: "Just wipe that one sticky spot on the counter, quickly, without making a mess."

2. The Three Kitchen Scenarios

To make sure their test is realistic, ERASE checks three different types of "cooking" (recommendation tasks):

  • Collaborative Filtering (CF): Like a friend saying, "You liked this movie, so you'll probably like that one." (Based on what others like).
  • Session-Based (SBR): Like a music playlist that changes based on what you are listening to right now.
  • Next-Basket (NBR): Like a grocery store predicting what you'll buy next based on your current cart.

3. The Two "Dirty Spots" (Scenarios)

ERASE tests the erasers in two specific, realistic situations:

  • The "Sensitive Ingredient" Scenario: Imagine you are recovering from an addiction and want the system to forget you ever bought alcohol. The test checks: Did the system stop suggesting alcohol to you?
  • The "Spam Chef" Scenario: Imagine a troll tries to trick the system by buying thousands of random items to push a specific product to the top. The test checks: Can the system remove the troll's influence and return to normal?

4. The Contestants (The Erasers)

The paper tested 7 different "eraser" algorithms (tools to remove data). Think of them as different cleaning crews:

  • The General Cleaners: Tools designed to clean any type of machine learning model, not just recommenders.
  • The Specialist Cleaners: Tools built specifically for recommendation systems (like the SCIF method).

5. The Results: Who Cleaned Best?

After running thousands of tests (generating over 600GB of data!), here is what they found:

  • The Specialist Wins (Mostly): The SCIF algorithm (a specialist cleaner) was the most reliable. It cleaned the "sticky spots" effectively without ruining the rest of the recipe book. It worked well across different scenarios.
  • The General Cleaners Struggle: The "General Cleaners" often got confused. When asked to clean repeatedly (like deleting a photo, then a song, then a book), they started to degrade the quality of the recommendations. They were like a janitor who, in trying to clean one spot, accidentally scrubbed the whole floor too hard and made it slippery.
  • The "Speed vs. Quality" Trade-off:
    • Retraining: The "Gold Standard." This is like throwing away the whole recipe book and writing a new one from scratch without the bad ingredients. It's perfect, but it takes days to do.
    • Unlearning: This is the "quick fix." It takes minutes.
    • The Catch: Most current "quick fixes" are either too slow (not much faster than rewriting the book) or they leave the book messy (the system still remembers the bad stuff).

6. The Big Takeaway

The paper concludes that while we have some good tools, we don't have a perfect "magic eraser" yet.

  • For small, specific deletions (like removing a sensitive item), the specialist tools work great.
  • For repeated deletions (like cleaning up spam over time), the tools often break down or lose their effectiveness.
  • The Future: We need tools that are as fast as a "quick wipe" but as thorough as "rewriting the book."

Why Does This Matter?

This isn't just about computers; it's about your rights.

  • Privacy: You have the legal right (like GDPR in Europe) to be "forgotten." ERASE helps engineers build systems that actually respect that right without crashing or giving you bad recommendations.
  • Safety: If a hacker tries to poison a system, ERASE helps us figure out how to clean the poison out quickly before it hurts anyone.

In short: ERASE is a giant, realistic "stress test" for the tools we use to delete data from AI. It shows us that while we are getting better at it, we still need to invent better "erasers" that are fast, accurate, and don't ruin the whole picture.