Modeling User Preferences as Distributions for Optimal Transport-Based Cross-Domain Recommendation under Non-Overlapping Settings

Imagine you are a travel guide trying to help a tourist (the "Target Domain") navigate a new, confusing city where they have no map and barely know anyone. This is the "Cold Start" problem in recommendation systems: the system doesn't know what the user likes because they are new or have very little history.

Usually, guides try to help by looking at the tourist's past visits to other cities (the "Source Domain"). But here's the catch: in the real world, you often can't link the tourist's past identity to their current one due to privacy rules. You can't say, "Oh, this is the same person who loved jazz in New York, so they'll love jazz here." The names and IDs are different.

This paper, DUP-OT, proposes a clever new way to solve this without needing to link specific people across cities.

The Old Way: The Rigid Checklist

Most recommendation systems treat a person's taste like a fixed checklist.

Example: "This user likes 30% Rock, 20% Jazz, and 50% Pop."
The Problem: This is too rigid. It's like saying a person is just a single point on a map. If the new city is slightly different, that checklist doesn't fit well. Also, if you can't link the tourist to their past self, you can't copy-paste their checklist.

The New Way: The "Flavor Cloud" (DUP-OT)

The authors suggest we stop thinking of taste as a checklist and start thinking of it as a cloud of flavors (a Gaussian Mixture Model).

Imagine a user's taste isn't a single dot, but a cloud of possibilities.

Maybe they are 70% "Rock Cloud" and 30% "Jazz Cloud," but those clouds have a shape and a spread. They might like "Rock" generally, but specifically "90s Rock" or "Indie Rock."
Why this helps: Even if you can't link the specific tourist, you can look at the shape of the clouds in the old city and the shape of the clouds in the new city.

The Magic Bridge: Optimal Transport

So, how do we move the "flavor cloud" from the old city to the new one without knowing who is who?

The paper uses a mathematical tool called Optimal Transport. Think of this as a logistics company moving furniture.

The Scenario: You have a warehouse in City A (Source) full of furniture (User Preferences) and a warehouse in City B (Target) that needs furniture. You don't know which specific chair belongs to which person, but you know the types of furniture in both warehouses.
The Solution: The logistics company calculates the most efficient way to move the types of furniture from City A to City B to fill the gaps.
In the Paper: The system looks at the "Rock Clouds" in the source domain and the "Rock Clouds" in the target domain. It calculates the most efficient "transport plan" to align them. It essentially says, "The 'Indie Rock' cloud in the source domain matches best with the 'Alternative Rock' cloud in the target domain," and shifts the user's preferences accordingly.

The Three-Step Recipe

The authors built a system (DUP-OT) that works in three simple stages:

The Translator (Preprocessing):
First, they take all the reviews people wrote (text) and translate them into a common language (embeddings). They use a shared "dictionary" so that a review about a "movie" in the Source Domain and a review about a "game" in the Target Domain can be understood in the same way.
The Shape Shifter (GMM Modeling):
Instead of making a checklist for every user, they build a "flavor cloud" for the whole city. They figure out the main "flavor clusters" (e.g., Action, Drama, Comedy) that exist in the data. Then, for each user, they just figure out how much of each flavor cluster they like. This is much lighter and more flexible than a rigid vector.
The Bridge Builder (Optimal Transport):
This is the magic step. They use the logistics math to align the "flavor clusters" of the Source City with the Target City. Once aligned, they can take a user's "flavor profile" from the Source City and "transport" it to the Target City, giving the new system a head start on what the user might like.

Why It Matters

The authors tested this on Amazon data (like moving from "Digital Music" users to "Electronics" users).

The Result: Even without knowing who the users were in the new city, this method predicted ratings much better than systems that just guessed based on the new city's data alone.
The Superpower: It was especially good at avoiding disastrous mistakes. If a user is new, a bad system might recommend something they absolutely hate (a huge error). DUP-OT's "cloud" approach is more cautious and robust, ensuring that even if it's not perfect, it won't be terrible.

In a Nutshell

Instead of trying to find a specific person's ID card to copy their history, DUP-OT looks at the general shape of their tastes (the cloud), figures out how those shapes map to the new environment using a smart logistics algorithm (Optimal Transport), and uses that to make a much smarter guess about what they will like next. It's like helping a tourist by understanding their style of travel, rather than needing their passport.

1. Problem Statement

The paper addresses two critical limitations in current Cross-Domain Recommendation (CDR) systems:

The Non-Overlapping Setting: Most existing CDR methods rely on overlapping users or items between the source (data-rich) and target (data-sparse) domains to establish connections. In real-world scenarios, strict privacy constraints or system silos often prevent the use of such overlapping entities during training, rendering standard CDR methods ineffective.
Limitations of Discrete Representations: Traditional CDR models represent user preferences as fixed discrete vectors. This approach fails to capture the fine-grained, multi-aspect nature of user interests (e.g., a user might like "action" movies but dislike "romance" movies within the same genre), leading to suboptimal performance, especially for cold-start users.

Goal: To develop a CDR framework that transfers knowledge from a source to a target domain without any overlapping users or items during training, while modeling user preferences as rich probability distributions rather than single vectors.

2. Methodology: DUP-OT Framework

The authors propose DUP-OT (Distributional User Preferences with Optimal Transport), a three-stage framework:

Stage 1: Shared Preprocessing

Input: Review text associated with user-item interactions.
Process:
- A unified pre-trained Sentence Encoder (e.g., all-MiniLM-L6-v2) encodes reviews into high-dimensional embeddings. A time-aware weighting scheme is applied to prioritize recent reviews.
- A Shared Autoencoder is trained on both domains to reduce dimensionality and create a unified, domain-consistent latent space. This ensures that item and user embeddings from different domains exist in the same feature space, facilitating alignment.

Stage 2: User Preference Modeling (GMM)

Instead of learning a single vector per user, DUP-OT models preferences as a Gaussian Mixture Model (GMM).

Domain-Level Components: To reduce computational complexity, the model assumes all users in a domain share a common set of Gaussian components (means $\mu$ and covariances $\Sigma$ ). These are fitted on the item embeddings of that domain using the EM algorithm (via BayesianGaussianMixture).
User-Specific Weights: Each user is represented not by a new set of Gaussians, but by a personalized mixture weight vector ( $w$ ) over the shared domain components.
Architecture:
- A Weight Learner (MLP) maps user embeddings to these mixture weights.
- A Rating Predictor (MLP) estimates ratings based on the Mahalanobis distance between an item embedding and the user's weighted Gaussian components.
- Loss: Standard Mean Squared Error (MSE) on ratings.

Stage 3: Cross-Domain Alignment via Optimal Transport (OT)

This stage bridges the source and target domains without overlapping entities.

Component Alignment: Since users share components within a domain, the model aligns the Gaussian components of the source domain to the target domain using Optimal Transport.
Cost Function: The transport cost is calculated using the Wasserstein-2 distance between Gaussian distributions (incorporating both means and covariances).
Transport Matrix ( $T$ ): The Sinkhorn algorithm solves for the optimal transport matrix $T$ that maps source components to target components.
Weight Transfer: User weights from the source domain ( $w^s$ ) are transported to the target domain ( $w^t$ ) via matrix multiplication: $w^t = w^s T$ .
Inference Strategy: For target domain users, the final prediction uses a linear fusion of the transferred distribution and the target-domain native distribution (if available), or relies solely on the transferred distribution for cold-start users.

3. Key Contributions

Non-Overlapping CDR Framework: Proposed a novel solution for CDR that operates strictly without overlapping users or items, addressing a realistic but under-explored constraint.
Distributional Preference Modeling: Introduced the use of Gaussian Mixture Models (GMMs) to represent user preferences, capturing multi-faceted interests better than discrete vectors.
Optimal Transport Integration: Developed a method to align GMM components across domains using Optimal Transport, enabling effective knowledge transfer without direct entity matching.
Efficiency: Designed the system so that OT is performed only at the component level (domain-level), not the user level, keeping computational overhead negligible compared to standard MLP training.

4. Experimental Results

Datasets: Amazon Review 5-core datasets (Digital Music, Movies & TV, Video Games as sources; Electronics as target).
Baselines:
- Single-Domain: LightGCN, NeuMF.
- Cross-Domain: TDAR (Text-enhanced Domain Adaptation Recommendation).
Key Findings:
- RQ1 (Cross-Domain Benefit): DUP-OT with source data (DUP-OT w/ source) consistently achieved lower RMSE than the version without source data, proving that cross-domain transfer mitigates large prediction errors.
- RQ2 (Distribution vs. Vector): DUP-OT (even without source data) outperformed single-domain baselines (LightGCN, NeuMF) in the target domain. This confirms that modeling preferences as distributions yields more expressive representations than discrete vectors.
- RQ3 (vs. TDAR): DUP-OT achieved lower RMSE than the state-of-the-art cross-domain baseline TDAR, though TDAR had slightly lower MAE.
  - Interpretation: Lower RMSE indicates DUP-OT is more robust against extreme mispredictions (outliers), which is crucial for cold-start users. TDAR's lower MAE suggests it spreads errors more evenly but fails to prevent severe failures in sparse scenarios.

5. Significance and Conclusion

The paper demonstrates that modeling user preferences as probability distributions combined with Optimal Transport is a powerful paradigm for Cross-Domain Recommendation, particularly in non-overlapping settings.

Practical Impact: The method is highly relevant for real-world applications where data silos and privacy regulations prevent the sharing of user IDs or item lists between platforms.
Cold-Start Mitigation: By leveraging the distributional nature of preferences, the model provides more stable predictions for users with little interaction history, significantly reducing the risk of catastrophic prediction errors (high RMSE).
Future Directions: The authors suggest exploring adaptive fusion strategies for combining transferred and native distributions, extending the model to implicit feedback, and incorporating structure-aware transport costs (e.g., Gromov-Wasserstein distance).

Modeling User Preferences as Distributions for Optimal Transport-Based Cross-Domain Recommendation under Non-Overlapping Settings

The Old Way: The Rigid Checklist

The New Way: The "Flavor Cloud" (DUP-OT)

The Magic Bridge: Optimal Transport

The Three-Step Recipe

Why It Matters

In a Nutshell

1. Problem Statement

2. Methodology: DUP-OT Framework

Stage 1: Shared Preprocessing

Stage 2: User Preference Modeling (GMM)

Stage 3: Cross-Domain Alignment via Optimal Transport (OT)

3. Key Contributions

4. Experimental Results

5. Significance and Conclusion

More like this

A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset

Probabilistic Language Tries: A Unified Framework for Compression, Decision Policies, and Execution Reuse

A Theory-guided Weighted L2L^2L2 Loss for solving the BGK model via Physics-informed neural networks

Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO

Enhancing sample efficiency in reinforcement-learning-based flow control: replacing the critic with an adaptive reduced-order model

A Theory-guided Weighted $L^2$ Loss for solving the BGK model via Physics-informed neural networks