Learning Demographic-Conditioned Mobility Trajectories with Aggregate Supervision

Imagine you are trying to teach a robot how to walk through a city like a human. You want the robot to know that a college student might rush to a library and a coffee shop, while a retired grandparent might take a slow stroll to the park and the grocery store.

The problem? You have a huge pile of GPS data showing where people went, but you don't know who they are. The data is "anonymous." It's like having a thousand videos of people walking, but everyone is wearing a mask. You can't tell the student from the grandparent.

This is the big problem the paper ATLAS solves.

The Big Idea: "Guessing the Recipe from the Cake"

Usually, to teach a robot to act like a specific group, you need labeled data: "Here is a video of a student walking. Here is a video of a grandparent walking." But because of privacy laws, that labeled data rarely exists.

ATLAS is a clever workaround. It uses a technique called "Weak Supervision."

Think of it like this:
You have a mystery cake (the anonymous GPS data). You don't know who baked it. But, you do have two other clues:

The Census: You know that in Neighborhood A, 80% of the people are students and 20% are retirees. In Neighborhood B, it's the opposite.
The Aggregate Stats: You know that in Neighborhood A, the total number of visits to "Libraries" is very high, and visits to "Parks" are low.

ATLAS says: "If Neighborhood A has mostly students, and the total traffic shows lots of library visits, then the 'Student' recipe must involve a lot of library visits!"

It reverse-engineers the specific behavior of different groups by looking at how the "ingredients" (demographics) mix in different "bowls" (neighborhoods) to create the final "taste" (aggregate traffic).

How ATLAS Works (The Two-Step Dance)

The method works in two phases, like training an actor:

Phase 1: The Generalist Actor
First, the AI learns to generate any human movement using the anonymous GPS data. It learns the basics: people go from home to work, they stop for coffee, they don't fly over buildings. It becomes a good "general human" simulator, but it doesn't know the difference between a student and a retiree yet.

Phase 2: The Specialized Director
Now, the AI gets the "Census Clues." The researchers tell the AI: "Okay, in this specific neighborhood, 70% of the people are young. So, when you simulate people for this neighborhood, make sure 70% of your generated paths look like a young person."

The AI then checks its work: "Did I generate enough library visits to match the real-world data for this neighborhood?" If not, it tweaks its internal rules for "Young People" until the math adds up. It does this for every neighborhood, slowly figuring out exactly what a "Young Person" looks like, a "Middle-Aged Person" looks like, etc., without ever seeing a single labeled face.

The Secret Sauce: Diversity and Detail

The paper discovered two things that make this magic work:

The "Mix" Matters: You need neighborhoods that are different from each other. If every neighborhood in the city had exactly the same mix of people (50% students, 50% retirees), the AI would get confused. It's like trying to figure out the taste of salt if you only ever eat a dish that is 50% salt and 50% sugar. You need some dishes that are mostly salt and some that are mostly sugar to figure out what salt tastes like.
The "Details" Matter: The AI needs detailed data. If you only tell it "People went to a 'Store'," it's hard to guess who went. But if you tell it "People went to Target vs. Whole Foods," the AI can easily guess: "Oh, Target is probably the students, and Whole Foods is the retirees." The more specific the data, the better the AI learns.

Why Should You Care?

Why do we need a robot that knows the difference between a student and a retiree?

Public Health: If a virus is spreading, knowing that "students" hang out in crowded bars while "retirees" stay in parks helps doctors target their warnings and resources correctly.
City Planning: If you want to build a new bus line, you need to know if it should serve the morning rush of workers or the afternoon trips of seniors.
Privacy: This is the best part. We can get these insights without invading anyone's privacy. We don't need to know your name or your age. We just need to know the general mix of the neighborhood and the total traffic.

The Result

The researchers tested ATLAS on real data from Virginia and California. They found that:

Without this method, the AI was terrible at guessing specific group behaviors (it was just a "blur" of everyone).
With ATLAS, the AI got much better (improving by 12% to 69% in accuracy).
It got so good that it almost matched a "perfect" AI that was trained on data where everyone's identity was known (which is usually impossible to get).

In short: ATLAS is like a detective who can figure out the habits of different groups of people just by looking at the neighborhood demographics and the total traffic, solving a privacy puzzle that everyone thought was unsolvable.

1. Problem Statement

Context: Human mobility trajectories are critical for public health modeling, transportation planning, and understanding social segregation. However, existing generative models for mobility trajectories fail to capture demographic heterogeneity (e.g., differences in movement patterns between age groups, genders, or socioeconomic statuses).

The Gap: While demographic differences significantly impact mobility (e.g., students vs. retirees), most publicly available trajectory datasets (like GeoLife or Veraset) lack individual-level demographic labels due to privacy constraints. Collecting ground-truth demographics requires expensive surveys, making "strongly supervised" training (pairing specific trajectories with specific demographics) infeasible for large-scale models.

Objective: The authors propose a method to learn demographic-conditioned trajectory generators using only:

Individual trajectories without demographic labels.
Region-level aggregated mobility features (e.g., total POI visits per region).
Region-level demographic compositions (e.g., census data showing the % of age/gender groups in a region).

2. Methodology: ATLAS

The authors introduce ATLAS (TrAjecTory Learning from AggregateS), a weakly supervised, model-agnostic framework that operates in two phases.

Phase 1: Unlabeled Trajectory Learning (Baseline)

A generative model (in this work, a Latent Diffusion Model using a BART autoencoder and Diffusion Transformer) is trained on individual trajectories.
Conditioning: The model is conditioned on individual features available in the data (e.g., home and work locations) but not on demographics.
Goal: To learn a strong spatiotemporal backbone of general human mobility patterns ( $P_\theta(\cdot | z)$ ).

Phase 2: Aggregate Supervision (Fine-tuning)

The model is extended to accept demographic conditioning ( $P_\theta(\cdot | d, z)$ ).
Training Signal: Instead of individual labels, the model is fine-tuned to match region-level aggregate statistics.
- For a given region $g$ , the model samples synthetic trajectories based on the region's known demographic composition $p(d|g)$ .
- It computes the aggregate features (e.g., POI visit counts) of these synthetic trajectories.
- Loss Function: The model minimizes the distance (e.g., Jensen-Shannon Divergence or Total Variation) between the synthetic regional aggregates and the observed real regional aggregates.
Key Innovation: This allows the model to "disentangle" demographic behaviors by observing how different demographic mixes in different regions result in different aggregate mobility patterns.

3. Theoretical Foundations

The paper provides a rigorous theoretical analysis of when and why ATLAS works, identifying two critical conditions:

Demographic Diversity Across Regions (Condition 1):
- The demographic composition matrix $P$ (rows = regions, columns = demographic groups) must have full column rank.
- Implication: If regions have identical demographic mixes, it is mathematically impossible to disentangle the specific mobility patterns of each group from the aggregates. Diverse regions allow for the unique recovery of group-level feature means.
- Stability: The error in recovering group distributions is bounded by $1/\sigma_{min}(P)$ , meaning poor diversity (ill-conditioned $P$ ) amplifies noise and degrades performance.
Feature Informativeness (Condition 2):
- The chosen aggregate feature map $\phi$ (e.g., POI counts) must be identifiable.
- Implication: The feature must capture enough behavioral detail to distinguish between demographic groups. If demographic differences are subtle and the feature is too coarse (e.g., only "city center" vs. "suburb"), the model cannot recover the specific distributions.

4. Key Contributions

Novel Framework (ATLAS): The first method to learn demographic-conditioned trajectory generation using only aggregate supervision, bypassing the need for individual-level demographic labels.
Model Agnosticism: The approach is compatible with any generative architecture (Diffusion, LLMs, VAEs, GANs) that supports conditional sampling.
Theoretical Guarantees: Formal proofs establishing that demographic recovery is possible if regions are diverse and features are informative, along with finite-sample error bounds.
Empirical Validation: Comprehensive experiments on real-world data (Embee dataset) demonstrating that ATLAS significantly outperforms baselines and approaches the performance of strongly supervised models.

5. Experimental Results

The authors evaluated ATLAS on mobility data from Virginia and California, using 8 demographic groups (4 age bins $\times$ 2 genders).

Performance vs. Baselines:
- ATLAS reduced the Jensen-Shannon Divergence (JSD) between synthetic and real trajectories by 12% to 69% compared to a baseline model trained without demographic conditioning.
- It successfully closed the gap between the baseline and the "Strong" (fully supervised) model, often recovering 80%+ of the performance gap.
Impact of Demographic Diversity (RQ1):
- Well-conditioned partitions (regions with distinct demographic mixes) yielded the best results, matching strong supervision closely.
- Ill-conditioned partitions (regions with similar mixes) caused performance to degrade, validating the theoretical requirement for demographic diversity.
Impact of Feature Choice (RQ2):
- POI-level features (specific locations visited) significantly outperformed coarse Category-level features (e.g., "Restaurant" vs. "Park"). Fine-grained features provided the necessary signal to distinguish demographic behaviors.
Downstream Utility (RQ3):
- Synthetic trajectories generated by ATLAS were used to train a Next-POI Prediction model.
- Models trained on ATLAS data significantly outperformed those trained on baseline data, achieving accuracy and geographic error rates close to models trained on real, labeled data. This proves the synthetic data captures meaningful, transferable demographic patterns.

6. Significance and Impact

Privacy-Preserving AI: ATLAS offers a solution to the "data silo" problem where privacy regulations prevent linking individual trajectories to demographics. It enables the creation of equitable, demographic-aware models using only public census data and anonymized aggregate mobility stats.
Equitable Policy Making: By accurately modeling how different demographic groups move, policymakers can better design transportation infrastructure, allocate healthcare resources, and understand social segregation without compromising individual privacy.
Generalizability: The theoretical framework suggests this approach could be extended to other domains where individual labels are missing but aggregate statistics and group compositions are available (e.g., consumer behavior, voting patterns).

In summary, ATLAS bridges the gap between privacy constraints and the need for demographic granularity in mobility modeling, proving that aggregate data is sufficient to learn high-fidelity, demographic-conditioned generative models.