Time-to-Event Modeling with Pseudo-Observations in Federated Settings

This paper proposes a one-shot, privacy-preserving federated framework for time-to-event analysis that utilizes pseudo-observations and a covariate-wise debiasing procedure to achieve flexible, accurate modeling of both proportional and non-proportional hazards without requiring iterative communication or pooling individual-level data.

Hyojung Jang, Malcolm Risk, Yaojie Wang, Norrina Bai Allen, Xu Shi, Lili Zhao

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Imagine a group of doctors from different hospitals across a city who want to answer a big question: "What factors make children more likely to become obese, and how does that risk change as they grow up?"

To get a clear answer, they need to look at data from thousands of kids. But there's a problem: Privacy laws (like HIPAA) say they cannot send the private medical records of individual children to a central computer. They can't mix their data together like pouring different cups of water into one big bucket.

This paper introduces a clever new way to solve this puzzle without ever sharing the private "cups of water."

The Old Way vs. The New Way

The Old Way (The "Shared Blueprint" Problem):
Previously, if hospitals wanted to work together, they often had to share a list of exactly when specific events happened (e.g., "Patient A got sick on Tuesday, Patient B on Friday"). Even though they didn't share names, sharing these specific dates could sometimes reveal sensitive details about the patients. Also, older methods often assumed that the risk factors stayed the same forever (like saying "being overweight is always twice as dangerous"), which isn't always true in real life.

The New Way (The "Ghostly Summaries" Approach):
The authors created a method called Federated Survival Analysis with Site-Level Heterogeneity Adjustment. That's a mouthful, so let's break it down with an analogy.

1. The "Ghostly Summaries" (Pseudo-Observations)

Instead of sending patient records, each hospital calculates a "summary score" for every single patient based on a shared, anonymous map of the overall situation.

  • The Analogy: Imagine every hospital has a local map of their neighborhood. They all agree to use a giant, city-wide map (the Federated Kaplan-Meier estimator) that shows the general traffic patterns.
  • Using this city map, each hospital calculates a "ghostly score" for their local patients. This score tells them, "Based on the city's traffic, this specific patient is likely to encounter a traffic jam at 2 PM."
  • They send these scores (not the patient's name or exact time) to the central team. The central team can now see the patterns without ever seeing the actual cars or drivers.

2. The "One-Shot" Conversation

Usually, computers in these networks have to talk back and forth many times to get the answer right (like a game of "Hot and Cold"). This new method is a "One-Shot" approach.

  • The Analogy: Instead of a long phone call, every hospital sends their "ghostly scores" and a few summary numbers in one single email. The central computer puts them together instantly to get the final answer. It's fast, efficient, and keeps the data secure.

3. The "Flexible Lens" (No Rigid Rules)

Old methods forced everyone to agree that risks never change over time. This new method is flexible.

  • The Analogy: Imagine looking at a tree through a rigid, square window. You only see a square slice of the tree. This new method uses a flexible, zoomable lens. It can see that a risk factor (like age) might be very dangerous when a child is 5, but less dangerous when they are 10. It captures the story of how risk changes over time, not just a single static number.

4. The "Smart Noise Filter" (Heterogeneity Adjustment)

Sometimes, one hospital might have a weird result just because of bad luck or a small sample size (noise), while another hospital might have a real unique difference because their patients are different (signal).

  • The Analogy: Imagine a choir. Most singers are singing the same note (the global truth). One singer is slightly off-key because they are nervous (noise). Another singer is intentionally singing a different harmony because it's a jazz song (real local difference).
  • The authors built a "Smart Noise Filter." It listens to the choir. If a singer is just slightly off-key due to nervousness, the filter gently nudges them back to the main note. But if a singer is intentionally singing a jazz harmony, the filter says, "Ah, that's a real difference! Let's keep it."
  • This ensures the final result isn't ruined by random errors, but it also doesn't ignore genuine local differences.

The Real-World Test

The team tested this on data from 45,000 children across four hospitals in Chicago (the CAPriCORN network).

  • The Result: Their new method produced answers almost identical to what you would get if all 45,000 records were magically combined in one place (which is illegal).
  • The Discovery: They found that while being overweight is a big risk factor, its impact changes over time. Also, the "Smart Noise Filter" successfully identified that one hospital had a unique pattern for a specific health condition, while smoothing out random errors in the others.

Why This Matters

This paper gives researchers a privacy-preserving superpower. It allows hospitals to collaborate on life-saving research without breaking privacy laws. It's like allowing a group of people to solve a giant jigsaw puzzle together without ever showing each other their individual pieces—instead, they just share the shapes of the edges, and the picture appears perfectly clear.