Operationalizing Longitudinal Causal Discovery Under Real-World Workflow Constraints

This paper proposes a framework for longitudinal causal discovery that enhances structural interpretability and reduces ambiguity in large-scale real-world data by formally encoding institutional workflow constraints into the causal graph search space, demonstrated through a successful application to a nationwide Japanese health screening cohort.

Tadahisa Okuda, Shohei Shimizu, Thong Pham, Tatsuyoshi Ikenoue, Shingo Fukuma

Published 2026-03-02
📖 5 min read🧠 Deep dive

Imagine you are trying to figure out the recipe for a perfect cake, but you don't have the recipe book. Instead, you only have a logbook of every time someone in a bakery tried to bake a cake over the last four years. You want to know: Did adding extra sugar actually make the cake fluffier, or did the baker just happen to use a better oven that day?

This is the challenge of Causal Discovery. For years, scientists have built powerful computers to solve this puzzle. But when they tried to use these computers on real-world data (like hospital records or national health surveys), the results were often messy, confusing, or wrong.

Why? Because real life isn't a clean, abstract math problem. It's a messy workflow.

The Problem: The "Calendar" vs. The "Kitchen"

In a perfect math world, time is just a number: 1, 2, 3. But in the real world, time is a schedule.

Imagine a bakery where:

  1. You weigh the ingredients (Time A).
  2. You mix the batter (Time B).
  3. You put it in the oven (Time C).

If you just look at the data without knowing the rules, the computer might get confused. It might think the oven heat caused the mixing, or that the mixing happened after the cake was baked, simply because the data was recorded in a weird order.

In the real world, hospitals and governments have strict workflows. A doctor measures your blood pressure, then decides if you need a lifestyle guide, then measures your blood pressure again next year. If a computer doesn't know this "script," it tries to guess every possible order of events, leading to a huge mess of wrong answers.

The Solution: The "Workflow GPS"

The authors of this paper realized that instead of building a smarter computer, they needed to give the existing computers a GPS map of the real-world workflow.

They didn't invent a new algorithm; they just taught the old one the rules of the game. Think of it like this:

  • Old Way: "Here is a pile of data. Guess the cause-and-effect. (Good luck!)"
  • New Way: "Here is the data. But remember: The doctor always checks the blood pressure before giving the advice. The advice never happens before the check-up. Also, the lifestyle habits are recorded as a summary of the whole year, so we can't tell which came first within that year. Now, guess the cause-and-effect."

By adding these "Workflow Rules," the computer stops guessing impossible scenarios and focuses only on the ones that make sense in the real world.

The Experiment: Japan's National Health Checkups

To test this, the authors used a massive dataset from Japan: 107,000 people checked up on over four years.

They wanted to see if the government's "Health Guidance" program (telling people to exercise and eat better) actually improved their health.

What they found:

  1. It works, but slowly: The guidance helped people lose weight (BMI) and lower blood pressure, but the biggest effect was seen immediately in the first year. The effects got smaller and fuzzier in later years.
  2. Uncertainty is key: Instead of just giving a single number (e.g., "It lowers blood pressure by 5 points"), they used a technique called Bootstrap (imagine running the experiment 1,000 times with slightly different groups of people) to say, "We are 95% sure the effect is between 3 and 7 points." This is crucial for doctors and policymakers who need to know how reliable the advice is.
  3. The "Motif" (The Pattern): Even though the data was huge and complex, a simple, repeating pattern emerged. The guidance consistently influenced weight and blood pressure in a specific way, regardless of the year.

The "Simulator" (The Crystal Ball)

The coolest part of this paper is what they built with the results. They turned the complex math into a simple "What-If" Simulator for doctors.

  • Forward Prediction: A doctor can ask, "If I tell this patient to stop smoking now, what will their blood pressure look like in two years?" The simulator gives an answer with a confidence range.
  • Goal Seeking: A doctor can ask, "I want this patient's blood pressure to be 120 in two years. What do they need to change today to make that happen?"

Why This Matters

This paper is a bridge. For a long time, "Causal AI" was like a Ferrari that could only drive on a perfect race track (clean, theoretical data). This paper puts tires on the Ferrari so it can drive on the bumpy, muddy roads of real life.

It teaches us that to understand cause and effect in the real world, you don't just need better math; you need to understand the story of how the data was collected. By respecting the workflow, we can finally trust the computer's advice to make real-world decisions that improve our health.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →