Learning Unified Representations from Heterogeneous Data for Robust Heart Rate Modeling

Imagine you are trying to teach a robot to predict how fast your heart will beat while you run or cycle. Sounds simple, right? But in the real world, it's a nightmare of confusion.

Here is the problem: Data is messy.

Different Gadgets: One runner wears a Garmin, another a Huawei, and a third a Coros. Each gadget measures different things. The Garmin might track "cadence" (how fast your legs spin), while the Huawei tracks "power." It's like trying to bake a cake using a recipe that lists "cups of flour" for one person and "grams of sugar" for another.
Different Bodies: Even if two people run the exact same route at the exact same speed, their hearts beat differently. One might be a marathon pro, the other a casual jogger. Their bodies react uniquely.

Most existing AI models try to force all this messy data into a single, rigid box. They say, "Okay, we'll only look at the data every device has," which means throwing away valuable information. Or they assume everyone's body works the same way, which leads to bad predictions.

The Solution: A "Universal Translator" for Heartbeats

The authors of this paper built a new AI framework (a smart computer program) that acts like a Universal Translator. Instead of forcing everyone to speak the same language, it learns to understand the meaning behind the data, regardless of the source.

Here is how they did it, using three clever tricks:

1. The "Blindfolded Chef" (Random Feature Dropout)

The Problem: The AI was getting too dependent on specific sensors. If a Garmin stopped working, the AI panicked because it only knew how to read Garmin data.
The Fix: The researchers taught the AI to cook with a blindfold on. During training, they randomly "dropped" (hid) certain features from the data. Sometimes they hid the speed, sometimes the altitude.
The Analogy: Imagine a chef who usually relies on a specific spice. To make them a better cook, you hide that spice every few days. Eventually, the chef learns to make the dish taste amazing using whatever ingredients are available. This makes the AI robust; it can handle data from any device, even if that device is missing a few sensors.

2. The "Memory Lane" (History-Aware Attention)

The Problem: Your heart rate today depends on how you felt last week. If you've been training hard, your heart recovers faster. If you've been sick, it beats faster. Most models forget this history.
The Fix: The AI has a "memory lane." It looks at your past workouts and uses a special spotlight (Attention) to figure out which past events matter most.
The Analogy: Think of a personal trainer who remembers your entire fitness journey. When you say, "I want to run 5 miles," the trainer doesn't just look at the 5 miles; they look at your last month of training to guess exactly how hard your heart will work. The AI does this by weighing your recent workouts more heavily than workouts from a year ago.

3. The "Group Photo" (Contrastive Learning)

The Problem: The AI needs to know that you are different from your friend, and that running is different from cycling.
The Fix: They used a technique called "Contrastive Learning." Imagine a group photo where the AI has to sort people into groups. It learns to push "Runners" away from "Cyclers" and "You" away from "Your Friend," while keeping all your own running sessions close together.
The Analogy: It's like organizing a library. The AI learns to put all books by the same author on the same shelf and separate them from other authors. This creates a clear map where the AI can instantly recognize your unique physiological "fingerprint."

The New Playground: PARROTAO

To prove their idea works, the authors couldn't just use old, clean data. They built a new, messy dataset called PARROTAO.

The Analogy: If previous tests were like driving on a perfectly paved highway, PARROTAO is like driving off-road through mud, rocks, and sand with different types of cars. It's a realistic test of whether the AI can actually survive in the real world.

The Results

When they tested this new "Universal Translator":

It was much more accurate: It reduced prediction errors by about 17% on standard data and 10% on their messy new data.
It works everywhere: It handled different devices and different people better than any previous method.
Real-world use: They showed it could help athletes pick the best running route based on predicted heart strain, or even fill in the gaps when a smartwatch sensor fails and misses data.

The Bottom Line

This paper is about teaching AI to be flexible. Instead of forcing the real world to fit the computer's rules, the computer learned to adapt to the messy, diverse, and unique reality of human bodies and gadgets. It's a big step toward smart health monitors that actually work for everyone, not just the lucky few with the perfect equipment.

1. Problem Statement

The paper addresses the critical challenge of data heterogeneity in real-world heart rate (HR) prediction, which hinders the deployment of personalized health monitoring systems. The authors identify two primary dimensions of this heterogeneity:

Source Heterogeneity: Wearable devices from different manufacturers (e.g., Garmin, Coros, Huawei) provide varying feature sets (sensor channels) and temporal resolutions. Existing models often discard device-specific signals or force data into a uniform intersection of features, leading to information loss.
User Heterogeneity: Individuals exhibit distinct physiological responses to the same activity. Furthermore, a single user's HR patterns vary across different activities (running vs. cycling) and over time. Most existing methods fail to effectively model these long-term, user-specific physiological traits.

The goal is to learn a unified latent representation that is robust to missing or varying sensor features (source) and adaptive to individual physiological profiles (user), enabling accurate HR prediction across diverse real-world scenarios.

2. Methodology

The authors propose a unified framework that learns a robust representation space through a multi-stage architecture (illustrated in Fig. 2 of the paper). The core components are:

A. Random Feature Dropout (Addressing Source Heterogeneity)

To make the model agnostic to specific device feature sets, a random feature dropout strategy is applied during training to both current and historical inputs.

Mechanism: A binary mask is generated for feature channels, randomly dropping features with a probability $p$ that follows a curriculum strategy (increasing from $p_{min}$ to $p_{max}$ over epochs).
Constraints: Essential features (e.g., speed, altitude) are protected from dropping, and a minimum number of features ( $K$ ) are retained per sample to ensure stability.
Goal: Forces the model to learn from diverse feature subsets rather than relying on specific sensor combinations, enhancing generalization across devices.

B. History-Aware Attention Module (Addressing User Heterogeneity)

To capture long-term physiological traits and user-specific patterns, the framework processes historical workout data ( $H_u$ ) before predicting the current session.

Intra-Workout Encoding: Uses Bi-LSTMs to encode temporal dynamics within individual past sessions, incorporating time intervals and sensor features.
Inter-Workout Modeling: A Gated Recurrent Unit (GRU) processes the sequence of past workout summaries to capture physiological evolution.
Attention Mechanism: A multi-head attention module uses the most recent workout context as a query to weigh the relevance of all historical sessions. This produces a context embedding ( $u_u$ ) that encapsulates the user's long-term fitness state.

C. Contrastive Representation Learning

To ensure the learned embeddings are discriminative, the model employs an InfoNCE contrastive loss.

Mechanism: The final user embedding ( $z_u$ ) is formed by concatenating the current session features with the history-aware context embedding.
Objective: The loss function pulls embeddings of the same user (or same activity type) closer together while pushing different users/activities apart. This creates a structured latent space where similar physiological states are clustered, improving the model's ability to distinguish between users and activities.

D. Training Objective

The model is trained end-to-end by minimizing a combined loss function:
$\mathcal{L} = \mathcal{L}_{MSE} + \lambda \mathcal{L}_{CL}$
Where $\mathcal{L}_{MSE}$ ensures accurate heart rate forecasting, and $\mathcal{L}_{CL}$ enforces the discriminative structure of the representation space.

3. Key Contributions

Unified Framework for Heterogeneity: A novel architecture that jointly handles source heterogeneity (via random feature dropout) and user heterogeneity (via history-aware attention and contrastive learning) without requiring uniform input features.
PARROTAO Dataset: The creation and public release of PARROTAO, a large-scale, multi-device, multi-activity benchmark dataset. Unlike existing datasets, PARROTAO preserves device-specific feature sets and cross-user variations, providing a rigorous testbed for real-world heterogeneity.
State-of-the-Art Performance: The proposed method significantly outperforms existing baselines (including physiology-based models, ODE-neural hybrids, and deep learning architectures like Transformers and TCNs) on both the public FitRec dataset and the new PARROTAO dataset.
Downstream Applications: Demonstrated practical utility in personalized route recommendation (predicting physiological costs of routes) and heart rate imputation (reconstructing missing sensor data).

4. Experimental Results

The model was evaluated on FitRec (public, relatively uniform) and PARROTAO (heterogeneous, multi-device).

Predictive Performance:
- FitRec: Reduced Mean Squared Error (MSE) by 17.5% compared to the strongest baseline.
- PARROTAO: Reduced MSE by 10.4% compared to the strongest baseline.
- The model achieved top-1 performance across the majority of individual sport categories on both datasets.
Ablation Studies:
- Removing Contrastive Learning caused the largest performance drop (MSE increased by ~20% on FitRec, ~36% on PARROTAO), highlighting its critical role in representation quality.
- Removing History-Aware Attention significantly degraded performance, especially on PARROTAO, confirming the importance of modeling long-term user traits.
- Feature Dropout acted as an effective regularizer, preventing overfitting to specific device features.
Representation Analysis:
- t-SNE visualizations and quantitative metrics (kNN accuracy, NMI, DBI) confirmed that the learned embeddings effectively separate different users and sports.
- Feature importance analysis revealed that feature relevance varies significantly by sport and user (e.g., "Power" is critical for some users, "Speed" for others), validating the need for adaptive modeling.
Downstream Tasks:
- Route Recommendation: Successfully predicted physiological demands for different terrain profiles, matching actual user data.
- Heart Rate Imputation: Outperformed classical methods (Kalman filter, linear interpolation) and the FitRec baseline in reconstructing missing HR data, achieving an MSE of 7.54 on PARROTAO vs. 34.81 for linear interpolation.

5. Significance

This work bridges the gap between controlled laboratory models and real-world wearable deployment. By explicitly addressing data heterogeneity, the proposed framework enables:

Cross-Platform Deployment: Systems can now integrate data from fragmented device markets without discarding unique sensor capabilities.
True Personalization: The model adapts to individual physiological baselines and historical trends, moving beyond generic population averages.
Robustness: The ability to handle missing or varying sensor channels makes the system viable for real-world scenarios where sensor data is often incomplete or inconsistent.

The introduction of the PARROTAO dataset sets a new standard for evaluating heart rate models, shifting the focus from homogeneous benchmarks to realistic, heterogeneous challenges.