DeepInterestGR: Mining Deep Multi-Interest Using Multi-Modal LLMs for Generative Recommendation

Imagine you are at a massive, endless library where the librarian (the recommendation system) tries to guess your next favorite book.

The Old Way (The "Shallow" Problem):
Currently, most librarians only look at the title and the back cover of the books you've borrowed. If you borrowed a book called "Running 101," they assume you like "running." But they don't know why. Do you run for fitness? For stress relief? To train for a marathon? Or just because you hate sitting still?

Because they only see the surface, their suggestions are often generic. They might suggest another running book, but you actually wanted a book about meditation because running was just your way to clear your mind. This is what the paper calls the "Shallow Interest" problem: the system sees the what, but misses the why.

The New Solution: DeepInterestGR
The authors propose a new system called DeepInterestGR. Instead of just reading the title, this system hires a team of super-smart detectives (Large Language Models, or LLMs) to dig deeper into your history.

Here is how it works, broken down into three simple steps:

1. The Detective Team (Multi-LLM Interest Mining)

Imagine you have a team of four different detectives: one is a logic expert, one is a creative artist, one is a data scientist, and one is a psychologist.

The Job: When you interact with an item (like buying "noise-canceling headphones"), these detectives don't just see "headphones." They use their reasoning powers to ask: "Why did they buy this?"
The Insight: They might conclude: "This user values silence for deep focus," or "They are a frequent traveler who hates airplane noise."
The Teamwork: Since different detectives have different strengths, the system combines their notes. If three of them agree the user likes "focus," that becomes a strong signal. This is called Multi-LLM Interest Mining.

2. The Secret Codebook (Interest-Enhanced Discretization)

Now, the system has a list of your deep interests (e.g., "focus," "travel," "aesthetic beauty"). But computers can't just read a list of words; they need a code.

The Translation: The system translates these deep insights into a special Secret Code (called Semantic IDs).
The Magic: Instead of just grouping items by their title, it groups them by their soul. A "noise-canceling headphone" and a "quiet study tent" might get similar codes because they both satisfy the deep interest of "focus." This allows the system to recommend things you didn't even know you wanted, as long as they fit your hidden profile.

3. The Coach with a Whistle (Reinforcement Learning with Reward)

Finally, the system needs to learn from its mistakes. In the old days, the system only got a "Good job!" if it guessed the exact item you clicked on.

The New Coach: This system has a special coach (the Interest-Aware Reward).
How it works: If the system recommends a "quiet study tent" and you click it, the coach gives a huge bonus, even if you didn't buy headphones. Why? Because the recommendation matched your deep interest (focus), not just the surface item.
The Result: The system learns to stop guessing random titles and starts guessing based on your true personality and motivations.

Why Does This Matter?

The paper tested this on real-world data (like Amazon reviews for beauty products, sports gear, and musical instruments).

The Result: The new system was 10% to 15% better than the best existing systems.
The Analogy: It's the difference between a vending machine that only knows you like "Soda" (Surface) versus a personal chef who knows you like "Soda" because you need a quick energy boost after a long run (Deep Interest). The chef can then suggest a protein bar or a new running shoe, and you'll love it.

In short: DeepInterestGR stops treating you like a list of clicks and starts treating you like a human with complex, hidden motivations. It uses AI detectives to find out who you really are, not just what you bought.

1. Problem Statement: The "Shallow Interest" Limitation

The paper identifies a critical bottleneck in existing Generative Recommendation frameworks (e.g., TIGER, LC-Rec). While these models successfully reformulate item prediction as an autoregressive generation of Semantic IDs (SIDs), they suffer from a "Shallow Interest" problem:

Surface-Level Encoding: Current methods encode items solely using surface-level textual features (titles, descriptions) or simple collaborative signals.
Lack of Semantic Depth: They fail to capture the latent, semantically rich motivations behind user interactions (e.g., a user buying "noise-canceling headphones" might be driven by a "focus-oriented work style" or "frequent travel" rather than just the product attributes).
Consequences: This leads to limited personalization depth, poor interpretability (the "black box" nature of recommendations), and suboptimal supervision for Reinforcement Learning (RL), which relies on rule-based or sparse behavioral rewards.

2. Methodology: The DeepInterestGR Framework

DeepInterestGR proposes a novel framework that integrates deep interest mining into the generative recommendation pipeline. It consists of three core innovations and a two-stage training pipeline.

A. Core Innovations

Multi-LLM Interest Mining (MLIM):
- Mechanism: Instead of relying on a single encoder, the system leverages multiple frontier Large Language Models (LLMs) (e.g., GPT-5.1, Gemini-3-Pro, Kimi-K2-Thinking, Grok-4) and their multi-modal variants.
- Process: Using Chain-of-Thought (CoT) prompting, these LLMs analyze item metadata (text + images) to extract interpretable, deep interest descriptions (e.g., "aesthetic preference," "fitness lifestyle").
- Ensemble Strategy: Outputs from multiple LLMs are aggregated. Consensus interests appearing in $\ge$ 2 models are merged via embedding similarity and ranked by frequency and confidence to create a robust user/item interest profile.
Reward-Labeled Deep Interest (RLDI):
- Purpose: To ensure the mined interests are high-quality and actionable for Reinforcement Learning.
- Process: A lightweight binary classifier (based on Qwen-Chat) assigns a reward label ($1$ for positive/specific, $0$ for negative/vague) to each mined interest.
- Validation: This approach achieved 87.2% agreement with human judgments on a sampled dataset.
Interest-Enhanced Item Discretization (IEID):
- Mechanism: The curated deep interest text is encoded into semantic embeddings using Qwen-Embedding-4B.
- Quantization: These deep interest embeddings are quantized into Semantic ID (SID) tokens using Residual Quantization VAE (RQ-VAE).
- Benefit: This enriches the item representation, ensuring that items with similar latent interests are mapped to nearby regions in the SID space, rather than just similar surface text.

B. Two-Stage Training Pipeline

Supervised Fine-Tuning (SFT):
- The generative model is trained to predict target SID sequences based on user history.
- The objective minimizes negative log-likelihood, aligning the model with both deep interest signals (encoded in SIDs) and collaborative filtering patterns.
Reinforcement Learning (RL) with Interest-Aware Reward:
- Algorithm: Uses Group Relative Policy Optimization (GRPO).
- Reward Function: Combines a base hit reward (accuracy) with an Interest-Aware Reward. If the predicted item is associated with a "positive" interest label (from RLDI), an additional bonus is added.
- Goal: This guides the policy toward generating semantically coherent and personalized item sequences.

3. Key Contributions

Problem Identification: Formally defined the "Shallow Interest" problem in generative recommendation and proposed a solution to capture latent user intents.
Novel Framework: Introduced DeepInterestGR, unifying multi-modal LLM reasoning with generative SID-based recommendation.
Three-Component Architecture:
- MLIM: Multi-LLM ensemble for extracting deep, multi-modal interests.
- RLDI: A quality control mechanism for interest labels to supervise RL.
- IEID: A discretization method that embeds semantic interests directly into item tokens.
Semantic Supervision: Designed an Interest-Aware Reward mechanism that moves beyond rule-based rewards, using mined semantic interests to guide policy optimization.

4. Experimental Results

The framework was evaluated on three Amazon Review benchmarks: Beauty, Sports, and Musical Instruments.

Performance: DeepInterestGR consistently outperformed state-of-the-art baselines (including SASRec, BERT4Rec, TIGER, LC-Rec, and MiniOneRec).
- Metrics: Achieved relative improvements of 9.2% to 15.1% in HR@K and NDCG@K.
- Comparison: It significantly surpassed the "MiniOneRec" baseline (which uses the same backbone but lacks deep interest mining), proving the value of the proposed interest extraction.
Ablation Studies:
- MLIM was the most critical component; removing it caused the largest performance drop (~11.8% in HR@5).
- RL optimization provided a significant boost (~16.4% improvement over SFT-only), validating the effectiveness of the Interest-Aware Reward.
- Multi-Modal: Adding visual features via multi-modal LLMs improved performance by ~5% (crucial for the Beauty domain).
Cross-Domain Generalization: DeepInterestGR showed superior transferability, improving cross-domain performance by 24.8% (HR@10) and 27.3% (N@10) compared to baselines. This suggests that deep interests (e.g., "trend-following") are more universal than surface-level product categories.

5. Significance

Paradigm Shift: The paper demonstrates that generative recommendation can scale effectively not just by increasing model size, but by enriching the semantic quality of the input data through LLM reasoning.
Interpretability: By explicitly mining and labeling "deep interests," the recommendation process becomes more transparent and explainable.
Scalability & Transferability: The approach proves that capturing latent user motivations allows models to generalize better across domains, solving a major limitation of traditional embedding-heavy or surface-text-based systems.
RL Integration: It establishes a new standard for using LLMs to generate semantic supervision signals for Reinforcement Learning in recommendation systems, moving beyond simple click-based rewards.

DeepInterestGR: Mining Deep Multi-Interest Using Multi-Modal LLMs for Generative Recommendation

1. The Detective Team (Multi-LLM Interest Mining)

2. The Secret Codebook (Interest-Enhanced Discretization)

3. The Coach with a Whistle (Reinforcement Learning with Reward)

Why Does This Matter?

1. Problem Statement: The "Shallow Interest" Limitation

2. Methodology: The DeepInterestGR Framework

A. Core Innovations

B. Two-Stage Training Pipeline

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Convolutional Surrogate for 3D Discrete Fracture-Matrix Tensor Upscaling

Generating Counterfactual Patient Timelines from Real-World Data

LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning

SIEVE: Sample-Efficient Parametric Learning from Natural Language

Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models