Embedded Inter-Subject Variability in Adversarial Learning for Inertial Sensor-Based Human Activity Recognition

The Big Problem: Everyone Moves Differently

Imagine you are teaching a robot to recognize when you are walking. You show the robot videos of yourself walking. The robot learns your specific stride, your speed, and how you swing your arms.

Now, you ask the robot to recognize your friend walking. The robot gets confused! Your friend walks faster, has longer legs, or swings their arms differently. Because the robot was only trained on you, it fails to recognize your friend.

In the world of technology, this is called Human Activity Recognition (HAR). Scientists use smartwatches and sensors to track what people are doing (walking, running, sitting). The biggest headache for these systems is Inter-Subject Variability. That's a fancy way of saying: "People are all unique, and they do the same things in different ways."

The Old Solutions (And Why They Failed)

Scientists tried to fix this in a few ways:

The "Show Me Everyone" Approach: They tried to collect data from thousands of people. But this is expensive, takes forever, and raises privacy concerns (who wants a robot memorizing exactly how they walk?).
The "Privacy Police" Approach: Some tried to use "Adversarial Learning." Think of this as a game of hide-and-seek. The system tries to learn the activity (walking) while a "censor" tries to guess who is walking. The system tries to trick the censor so it can't tell the difference between people.
- The Flaw: The old methods were like trying to hide a specific person's identity in a crowd of 100 people. It was hard to scale and often didn't work well for everyone.

The New Solution: The "Universal Dance Instructor"

The authors of this paper came up with a clever new way to train the AI. Instead of just trying to hide who the person is, they changed the rules of the game to focus on the activity itself.

Here is how their new system works, using a Dance Class analogy:

1. The Setup (The Feature Extractor)

Imagine a Dance Instructor (the AI) who watches people dance. Their job is to figure out what dance is being performed (e.g., "The Waltz"), not who is dancing.

2. The New Game (The Adversarial Task)

In the old games, the AI had to guess: "Is this dancer Alice or Bob?"
In this new game, the AI plays a different game. The teacher shows the AI two dancers at the same time and asks:

"Are these two people doing the same dance, and are they the same person?"

Scenario A: Two different people doing the same dance (e.g., Alice and Bob both doing the Waltz).
Scenario B: The same person doing the same dance (e.g., Alice doing the Waltz, then Alice doing the Waltz again).

The AI has to learn to say: "Yes, these are the same dance, but they are different people!"

3. The "Magic" Insight

By forcing the AI to look at pairs of people doing the same activity, the AI is forced to ignore the "Alice-ness" or "Bob-ness" of the movement. It has to find the common thread that makes a Waltz a Waltz, regardless of who is dancing it.

It's like teaching a child to recognize a "Dog" by showing them a Golden Retriever and a Chihuahua side-by-side. The child learns that despite the size and fur differences, they are both "Dogs." The AI learns that despite the speed and arm-swing differences, they are both "Walking."

How They Tested It

The researchers tested this on three different "dance floors" (datasets) containing data from real people wearing sensors. They used a strict test called Leave-One-Subject-Out (LOSO).

The Test: They trained the AI on 9 people, and then tested it on the 10th person it had never seen before.
The Result: The new method was much better at recognizing the 10th person's movements than any previous method. It didn't just guess; it actually understood the essence of the movement.

Why This Matters

Better Accuracy: The robot works better for new people immediately, without needing to be retrained.
Privacy Friendly: Because the system learns to ignore who you are to focus on what you are doing, it accidentally becomes better at protecting your privacy. It forgets your specific identity while remembering your actions.
Scalable: You don't need to collect data from millions of people to make it work. It learns the "universal rules" of movement.

The Bottom Line

The authors built a smarter AI teacher. Instead of memorizing how one specific person moves, it learned to see the "soul" of the movement itself. By playing a game where it compares pairs of people doing the same thing, it learned to ignore the differences between people and focus on the similarities.

In short: They taught the computer to stop looking at the dancer and start looking at the dance.

1. Problem Statement

The paper addresses the challenge of Human Activity Recognition (HAR) using wearable inertial sensors (accelerometers and gyroscopes). A critical bottleneck in HAR is inter-subject variability: the same activity is performed differently by different individuals due to variations in speed, intensity, physical characteristics, and personal preferences.

The Gap: Models trained on specific users often fail to generalize to "unseen" individuals (new users) because the data distribution shifts significantly between subjects.
Limitations of Existing Methods:
- Multi-task Learning: Methods that jointly train for activity and user recognition often suffer from scalability issues (the user classifier grows with new participants) and privacy concerns.
- Standard Adversarial Learning: Previous adversarial approaches (e.g., trying to hide user identity) often struggle with uniform representation across all users or require complex class structures that do not scale well.
- Data Collection: Collecting and annotating massive amounts of data for every new user is resource-intensive.

2. Methodology

The authors propose a novel deep adversarial framework designed to learn subject-invariant feature representations. The core innovation is embedding the concept of inter-subject variability directly into the adversarial discrimination task.

Framework Architecture

The system consists of four main blocks:

Feature Extractor ( $F$ ): Encodes raw sensor signals into a lower-dimensional latent space ( $L$ ).
Reconstructor ( $R$ ): Decodes the latent space back to the original input space to stabilize training via reconstruction loss.
Activity Classifier ( $C$ ): Maps the latent space to activity labels ( $Y$ ).
Discriminator ( $D$ ): A binary classifier that determines the relationship between pairs of feature vectors.

The Novel Discrimination Task

Unlike previous methods that simply try to distinguish users, this framework introduces a new input set $A'$ for the discriminator. The discriminator receives pairs of data samples $(x_a, x_b)$ that share the same activity label but may come from different subjects.

Binary Label ( $g_i$ ):
- $g_i = 1$ : The pair comes from the same subject.
- $g_i = 0$ : The pair comes from different subjects.
Goal: The Feature Extractor ( $F$ ) is trained to fool the Discriminator ( $D$ ) when $g_i=0$ . It must make features from different people performing the same activity look indistinguishable from features of the same person. This forces the model to learn a shared feature space for each activity, effectively removing subject-specific cues.

Training Process (Three Steps)

Step 1 (Pre-training): Train $F$ and $R$ using reconstruction loss ( $L_R$ ) on raw data.
Step 2 (Supervised Learning): Train $F, R, C,$ and $D$ simultaneously. $F$ minimizes classification loss ( $L_C$ ) and reconstruction loss ( $L_R$ ), while $D$ learns to distinguish same/different subjects.
Step 3 (Adversarial Learning):
- Step 3.1: Freeze $D$ $D$ and $R$ $R$ . Train $F$ $F$ to minimize a combined loss: Classification ( $L_C$ $L_{C}$ ) + Reconstruction ( $L_R$ $L_{R}$ ) + Adversarial Loss ( $L_A$ ).
  - Crucial Mechanism: The adversarial loss uses a non-saturating GAN loss. It specifically targets pairs from different subjects ( $g=0$ ) and labels them as "same subject" ( $g=1$ ) during the gradient update for $F$ . This pushes $F$ to erase inter-subject differences.
- Step 3.2: Freeze $F, C, R$ . Train $D$ to correctly distinguish the pairs.
- This alternating optimization continues until convergence.

Loss Function

The total loss for the feature extractor in Step 3 is:
$L_{F}^{step3.1} = w_A L_A + w_R L_R + w_C L_C$
Where $w_A, w_R, w_C$ are weights balancing the adversarial, reconstruction, and classification objectives.

3. Key Contributions

Novel Adversarial Framework: Integration of inter-subject variability into the adversarial task by using paired samples of the same activity from different users.
Scalable Binary Discrimination: The discrimination task is framed as a binary classification problem (Same vs. Different subject) rather than multi-class user identification, avoiding scalability issues as new users are added.
Privacy Preservation: By learning subject-invariant features, the model inherently reduces the ability to identify specific users, addressing privacy concerns.
Comprehensive Evaluation: Extensive validation using Leave-One-Subject-Out (LOSO) cross-validation, ensuring the model is tested on completely unseen individuals.

4. Experimental Results

The method was evaluated on three standard HAR datasets: PAMAP2, MHEALTH, and REALDISP.

Performance Metrics: Accuracy and Macro F1-Score.
Comparison: The proposed method outperformed state-of-the-art baselines (MCCNN, DCLSTM, METIER, UIDFE, DDLearn) across all three datasets.
- PAMAP2: Achieved 87.03% Accuracy (vs. ~80% for best baseline).
- REALDISP: Achieved 97.10% Accuracy (vs. ~94.5% for best baseline).
- MHEALTH: Achieved 92.25% Accuracy (vs. ~89.8% for best baseline).
Robustness: The proposed method showed the lowest Interquartile Range (IQR) in F1-scores for 2 out of 3 datasets, indicating higher stability and less variability in performance across different test subjects.
Distribution Analysis: Using Wasserstein Distance, the authors demonstrated that Step 3 (adversarial training) significantly reduced the distribution distance between training and testing subjects for the same activity, confirming the reduction of inter-subject variability.
Ablation Study: Removing the adversarial step (stopping at Step 2) resulted in lower performance, proving the necessity of the adversarial task for generalization.
Discriminator Comparison: The proposed binary discrimination task outperformed previous discrimination strategies (identifying specific users or simple pair matching) when integrated into the same framework.

5. Significance

This paper presents a significant advancement in making HAR systems user-agnostic. By explicitly modeling the variability between subjects as a learning objective, the framework bridges the generalization gap without requiring extensive retraining or new data collection for every new user.

Practical Impact: Enables the deployment of HAR systems in real-world scenarios (healthcare, robotics, autonomous vehicles) where models must adapt to new users immediately.
Privacy: The approach naturally enhances privacy by decoupling activity recognition from user identity.
Efficiency: The binary discrimination task ensures the model remains scalable regardless of the number of users in the training set.