Give Users the Wheel: Towards Promptable Recommendation Paradigm

Imagine you have a very smart, loyal personal shopper who knows your taste in movies better than anyone else. They've watched everything you've ever clicked on, bought, or rated. If you ask them, "What should I watch tonight?" they will immediately suggest the latest action thriller because, well, that's what you usually watch.

But what if you say, "Actually, I'm in the mood for a silly cartoon to watch with my kids tonight"?

In the old world of recommendation systems, your personal shopper would likely ignore you. They are so focused on your "history" (your past habits) that they can't pivot. They might say, "But you love action movies! Here's another one!"

This paper introduces a new system called DPR (Decoupled Promptable Sequential Recommendation). Think of it as giving the user the steering wheel of the recommendation car. Instead of just following the GPS route based on where you've been, the driver (the AI) now listens to your voice commands to change the destination instantly.

Here is how it works, broken down with simple analogies:

1. The Problem: The "Stubborn Shopper" vs. The "Slow Librarian"

Currently, there are two ways to handle this:

The Old Way (Sequential Models): These are like the Stubborn Shopper. They are incredibly fast and know your history perfectly, but they are blind to your current mood. If you want something different, they can't adapt.
The New Way (Large Language Models/LLMs): These are like Super-Librarians who can understand complex sentences like "I want a movie that feels like a rainy Sunday." However, they are slow, expensive to run, and often forget the specific details of your past purchases (like which specific actor you love).

The paper asks: Can we have the speed and memory of the Shopper, but the listening skills of the Librarian?

2. The Solution: The "Dual-Path" Brain

The authors built a system that acts like a hybrid brain. It keeps the fast, efficient "Shopkeeper" part (which knows your history) but adds a special "Control Panel" that listens to your natural language.

Here are the three magic ingredients they used:

A. The "Fusion Module" (The Translator)

Imagine the Shopper speaks "Math" (numbers and IDs) and the Librarian speaks "English" (words). They can't talk to each other.
The Fusion Module is a translator. It takes your sentence ("I want a comedy") and instantly converts it into a "mathematical nudge" that the Shopper understands. It doesn't replace the Shopper; it just whispers in its ear, "Hey, shift the focus slightly toward comedy."

B. The "Mixture of Experts" (The Two-Track System)

This is the most clever part. The paper realized that wanting something and not wanting something are totally different mental tasks.

Positive Steering: "I want a comedy." (This is like adding sugar to your coffee).
Negative Steering: "No horror movies." (This is like removing the coffee beans).

If you try to do both with the same set of instructions, the system gets confused (like trying to add and subtract at the same time).
So, DPR uses a Two-Tower System:

Tower 1 is an expert at Adding (finding what you want).
Tower 2 is an expert at Subtracting (hiding what you don't want).
They work in parallel, so the system never gets confused about whether to push a movie up or push it down the list.

C. The "Three-Stage Training" (The School Curriculum)

You can't just throw a student into a PhD program; they need to learn step-by-step. The system is trained in three stages:

Stage 1 (The Basics): It learns to be a great Shopper first, memorizing your history perfectly.
Stage 2 (The Categories): It learns to understand broad categories (like "Action" or "Comedy").
Stage 3 (The Nuance): Finally, it learns the deep, subtle meanings of your words (like "a movie that feels like a rainy Sunday").

This ensures the system doesn't forget how to recommend movies just because it's learning to listen to you.

3. Why This Matters

The results are impressive. In tests:

Speed: It's as fast as the old systems (no waiting for a slow AI to think).
Accuracy: When you ask for a specific mood, it finds the right movie 70% better than previous methods.
Flexibility: It can handle "I want X" and "I don't want Y" simultaneously without crashing.

The Bottom Line

This paper solves the "Dilemma of the Recommendation System." Before, you had to choose between a system that knew your history but ignored your voice, or a system that understood your voice but was slow and forgot your history.

DPR gives you the wheel. It lets you drive the recommendation engine, telling it exactly where to go right now, while still remembering the roads you've traveled before. It's the difference between a GPS that stubbornly drives you back to your office when you ask for a park, and a GPS that says, "Got it, rerouting to the park!"

Here is a detailed technical summary of the paper "Give Users the Wheel: Towards Promptable Recommendation Paradigm".

1. Problem Statement

Current sequential recommendation systems (SR) face a critical disconnect: they excel at mining implicit historical behavioral patterns but are structurally blind to explicit user intent expressed in real-time via natural language.

The Limitation of Conventional Models: Models like SASRec or GRU4Rec rely on historical sequences. If a user with a history of action movies suddenly asks for "children's movies," these models fail to adapt, continuing to recommend based on inertia.
The Limitation of Existing LLM Integrations:
- LLM-as-Recommender: Replacing the backbone with an LLM sacrifices the efficiency and collaborative filtering precision of ID-based retrieval, leading to high inference latency.
- Reranking Paradigm: Using LLMs to rerank candidates retrieved by a conventional model is fundamentally bottlenecked. If the initial retrieval fails to include relevant items due to an intent shift, the LLM has no valid candidates to optimize.

The Goal: To create a system that allows users to "steer" recommendations via natural language prompts (positive or negative) without abandoning the efficiency of collaborative filtering or the precision of ID-based retrieval.

2. Methodology: Decoupled Promptable Sequential Recommendation (DPR)

The authors propose DPR, a model-agnostic framework that natively integrates natural language prompts into the collaborative retrieval process.

A. Core Architecture

DPR consists of three main components:

Sequential Encoder: A standard backbone (e.g., SASRec, GRU4Rec) that extracts the user's intrinsic interest representation ( $h_u$ ) from historical behavior. This remains "uncontaminated" by the prompt initially.
Prompt Embedder: Encodes the natural language instruction ( $p$ ) into a semantic vector ( $c_p$ ) using a pre-trained encoder (e.g., Sentence-BERT) and an MLP projector to match the dimension of $h_u$ .
Signal Fusion Module (The Core Innovation):
- Mixture-of-Experts (MoE) Tower: Instead of a single fusion block, DPR uses two parallel, independent paths:
  - Positive Fusion Block ( $f^+$ ): Handles "positive steering" (e.g., "I want comedy").
  - Negative Fusion Block ( $f^-$ ): Handles "negative suppression" (e.g., "No horror").
- Mechanism: Both blocks use Multi-Head Cross-Attention (MHCA) where the user representation is the Query and the prompt vector is the Key/Value.
- Residual Connection: The prompt information is added to the original representation via a residual connection ( $h_{res} = h_u + z_c$ ) to preserve the stability of the user's historical preferences.
- Rationale: Positive steering (feature injection) and negative suppression (feature rejection) are conflicting optimization goals. Separating them prevents gradient conflicts and allows the model to specialize in amplifying desired features or suppressing restricted ones.

B. Training Strategy

DPR employs a Three-Stage Training Strategy to ensure robustness and semantic alignment:

Stage 1 (Pre-training): Standard pre-training of the sequential encoder on next-item prediction to capture fundamental behavioral patterns.
Stage 2 (Coarse Alignment): Fine-tuning the model to align user representations with broad category (genre) embeddings. This acts as a scaffold.
Stage 3 (Deep Semantic Alignment): The core contribution. The model is fine-tuned using Semantic Augmentation.
- Augmentation: LLMs generate fine-grained, multi-dimensional tags (Narrative, Atmosphere, Appeal) for items, bridging the gap between specific titles and abstract genres.
- Lexical Decoupling: Training and testing sets use lexically distinct but semantically equivalent tags to ensure the model learns latent semantics rather than memorizing keywords.

C. Unified Loss Function

The training objective combines standard sequential loss with a prompt-specific loss.

Positive Steering: Maximizes the likelihood of a single target item sampled from a "compliance list" (items matching the prompt).
Negative Suppression: Operates as a multi-target optimization over the entire compliance list, encouraging the model to redistribute probability mass away from restricted items via softmax competition.

3. Key Contributions

Definition of Promptable Recommendation: Formalized a new paradigm where natural language prompts dynamically steer the retrieval space while retaining collaborative signals.
DPR Framework: A model-agnostic architecture featuring:
- A Fusion Module to align semantic and collaborative signals.
- A Dual-Path MoE Tower to disentangle positive and negative control signals, resolving optimization conflicts.
- A Three-Stage Training Strategy with semantic augmentation to handle diverse modalities and ensure generalization.
Empirical Superiority: Demonstrated that DPR outperforms both traditional baselines and LLM-based approaches in prompt-guided tasks while maintaining competitive performance in standard scenarios.

4. Experimental Results

Experiments were conducted on MovieLens-1M and MIND datasets.

Performance vs. Baselines:
- Positive Steering: DPR significantly outperformed the strongest "Filter" baselines. On ML-1M (SASRec), DPR achieved a 71.84% relative improvement in NDCG@10 over the best filter baseline.
- Negative Suppression: DPR consistently outperformed filter-based methods, particularly with the GRU4Rec backbone (+15.37% on ML-1M).
- Comparison with LLMs: DPR vastly outperformed zero-shot LLMs (e.g., Llama-2, Qwen) and even fine-tuned LLM-based recommenders (e.g., RecGPT). For instance, in positive tasks, DPR achieved a Recall@10 of 0.7300 compared to 0.3626 for the best LLM baseline.
- Efficiency: Unlike LLM reranking, which degrades as the candidate pool grows (due to noise), DPR maintains robust performance in a single end-to-end inference step.
Ablation Studies:
- Stage Design: Removing the intermediate coarse-grained alignment (Stage 2) caused significant performance drops (e.g., -7.85% in NDCG@10), proving the necessity of the curriculum learning approach.
- Loss Design: Both positive and negative loss terms are critical; removing either causes catastrophic failure in the respective task.
- Architecture: The Two-Tower (MoE) design is essential. Using a Single-Tower (shared parameters) resulted in a ~27-35% performance drop, confirming that positive and negative steering require distinct parameter spaces.

5. Significance

This paper challenges the prevailing trade-off between the efficiency of collaborative filtering and the controllability of Large Language Models.

Paradigm Shift: It moves away from "LLM as the recommender" or "LLM as a post-processor" toward "LLM as a controller" that modulates the latent space of efficient, ID-based models.
Practical Impact: DPR offers a scalable solution for real-world recommendation systems where users frequently change their minds or have specific, context-dependent needs (e.g., "watch with kids," "no horror") without incurring the latency costs of generative LLMs.
Technical Insight: The work highlights that conflicting optimization goals (adding vs. removing features) in recommendation systems are best handled via architectural decoupling (MoE) rather than shared parameter spaces.