Learning Demographic-Conditioned Mobility Trajectories with Aggregate Supervision

The paper introduces ATLAS, a weakly supervised framework that generates demographic-conditioned human mobility trajectories by leveraging unlabeled individual data alongside region-level aggregate mobility and census demographics, thereby significantly improving demographic realism without requiring labeled trajectory datasets.

Jessie Z. Li, Zhiqing Hong, Toru Shirakawa, Serina Chang

Published 2026-03-04
📖 5 min read🧠 Deep dive

Imagine you are trying to teach a robot how to walk through a city like a human. You want the robot to know that a college student might rush to a library and a coffee shop, while a retired grandparent might take a slow stroll to the park and the grocery store.

The problem? You have a huge pile of GPS data showing where people went, but you don't know who they are. The data is "anonymous." It's like having a thousand videos of people walking, but everyone is wearing a mask. You can't tell the student from the grandparent.

This is the big problem the paper ATLAS solves.

The Big Idea: "Guessing the Recipe from the Cake"

Usually, to teach a robot to act like a specific group, you need labeled data: "Here is a video of a student walking. Here is a video of a grandparent walking." But because of privacy laws, that labeled data rarely exists.

ATLAS is a clever workaround. It uses a technique called "Weak Supervision."

Think of it like this:
You have a mystery cake (the anonymous GPS data). You don't know who baked it. But, you do have two other clues:

  1. The Census: You know that in Neighborhood A, 80% of the people are students and 20% are retirees. In Neighborhood B, it's the opposite.
  2. The Aggregate Stats: You know that in Neighborhood A, the total number of visits to "Libraries" is very high, and visits to "Parks" are low.

ATLAS says: "If Neighborhood A has mostly students, and the total traffic shows lots of library visits, then the 'Student' recipe must involve a lot of library visits!"

It reverse-engineers the specific behavior of different groups by looking at how the "ingredients" (demographics) mix in different "bowls" (neighborhoods) to create the final "taste" (aggregate traffic).

How ATLAS Works (The Two-Step Dance)

The method works in two phases, like training an actor:

Phase 1: The Generalist Actor
First, the AI learns to generate any human movement using the anonymous GPS data. It learns the basics: people go from home to work, they stop for coffee, they don't fly over buildings. It becomes a good "general human" simulator, but it doesn't know the difference between a student and a retiree yet.

Phase 2: The Specialized Director
Now, the AI gets the "Census Clues." The researchers tell the AI: "Okay, in this specific neighborhood, 70% of the people are young. So, when you simulate people for this neighborhood, make sure 70% of your generated paths look like a young person."

The AI then checks its work: "Did I generate enough library visits to match the real-world data for this neighborhood?" If not, it tweaks its internal rules for "Young People" until the math adds up. It does this for every neighborhood, slowly figuring out exactly what a "Young Person" looks like, a "Middle-Aged Person" looks like, etc., without ever seeing a single labeled face.

The Secret Sauce: Diversity and Detail

The paper discovered two things that make this magic work:

  1. The "Mix" Matters: You need neighborhoods that are different from each other. If every neighborhood in the city had exactly the same mix of people (50% students, 50% retirees), the AI would get confused. It's like trying to figure out the taste of salt if you only ever eat a dish that is 50% salt and 50% sugar. You need some dishes that are mostly salt and some that are mostly sugar to figure out what salt tastes like.
  2. The "Details" Matter: The AI needs detailed data. If you only tell it "People went to a 'Store'," it's hard to guess who went. But if you tell it "People went to Target vs. Whole Foods," the AI can easily guess: "Oh, Target is probably the students, and Whole Foods is the retirees." The more specific the data, the better the AI learns.

Why Should You Care?

Why do we need a robot that knows the difference between a student and a retiree?

  • Public Health: If a virus is spreading, knowing that "students" hang out in crowded bars while "retirees" stay in parks helps doctors target their warnings and resources correctly.
  • City Planning: If you want to build a new bus line, you need to know if it should serve the morning rush of workers or the afternoon trips of seniors.
  • Privacy: This is the best part. We can get these insights without invading anyone's privacy. We don't need to know your name or your age. We just need to know the general mix of the neighborhood and the total traffic.

The Result

The researchers tested ATLAS on real data from Virginia and California. They found that:

  • Without this method, the AI was terrible at guessing specific group behaviors (it was just a "blur" of everyone).
  • With ATLAS, the AI got much better (improving by 12% to 69% in accuracy).
  • It got so good that it almost matched a "perfect" AI that was trained on data where everyone's identity was known (which is usually impossible to get).

In short: ATLAS is like a detective who can figure out the habits of different groups of people just by looking at the neighborhood demographics and the total traffic, solving a privacy puzzle that everyone thought was unsolvable.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →