Risk time splitting for improved estimation of screening programs effect on later mortality

Here is an explanation of the paper, translated into everyday language with some creative analogies.

The Big Picture: The "Ghost in the Machine" Problem

Imagine you are a city planner trying to see if a new, faster subway line actually reduces traffic jams. You open the subway in 2010. But, you know that many people who are stuck in traffic right now got into their cars back in 2005. They aren't going to take the subway; they are already on the road.

If you just count all the traffic jams in 2015, your data will be "diluted." It will look like the subway didn't help much, because you are mixing up:

The "New" Commuters: People who started driving in 2012 and could have taken the subway.
The "Old" Commuters: People who started driving in 2005 and are stuck in traffic no matter what.

In the world of cancer screening (like mammograms), this is the exact problem. When a screening program starts, it can only save the lives of women who haven't been diagnosed yet. It cannot save women who were diagnosed years ago and are already fighting the disease.

The Paper's Goal: The authors wanted a better way to measure if screening actually saves lives without getting confused by the "Old Commuters" (people diagnosed before the program started).

The Old Way: The "Selective Snapshot" (Method I & II - The Old School)

Previously, researchers tried to solve this by taking a "snapshot" of a specific group. They would say, "Let's only look at women in County A who were invited to screening, and compare them to women in County B who were never invited."

The Flaw: This is like trying to judge a restaurant's quality by only eating the appetizers and ignoring the main courses. By ignoring huge chunks of data (like women in County A who weren't invited yet, or women in County B who were invited later), they threw away valuable information. This made their results "fuzzy" (wide confidence intervals), meaning they couldn't be very sure if the screening was working.

The New Way: "Risk Time Splitting" (The Authors' Solution)

The authors (Weedon-Fekjær, Lynge, and Keiding) developed a clever mathematical trick called Risk Time Splitting.

Think of it like a Time-Traveling Detective.

Step 1: The "Time Machine" (Historical Data)

First, the researchers look at history. They ask: "In the days before screening existed, how long did it usually take for a woman diagnosed with breast cancer to pass away?"

Analogy: Imagine a clock. If a woman is diagnosed, the clock starts ticking. Historically, the clock might run for 5, 10, or 15 years before the end.

Step 2: The "Split"

Now, they look at the current screening program. They take every death that happens after the screening started and ask: "Was this woman diagnosed before or after the screening program began?"

The "Pre-Screening" Diagnoses: These are the "Old Commuters." Even though they died after the screening started, they were diagnosed before it existed. Screening couldn't have saved them.
The "Post-Screening" Diagnoses: These are the "New Commuters." They were diagnosed after the screening started. Screening could have saved them.

Step 3: The "Mathematical Filter" (The Offset)

Here is the magic part. Instead of throwing away the "Old Commuters," the researchers use their "Time Machine" data to calculate exactly how many of the deaths should have been "Old Commuters" even if screening had no effect.

They use a Mathematical Filter (called an "offset" in statistics) to adjust the numbers.

Analogy: Imagine you are weighing a basket of fruit. You know that 20% of the weight comes from the basket itself, not the fruit. Instead of throwing away the basket, you just subtract the basket's weight from the total. Now you have the true weight of the fruit.

In this paper, they subtract the "expected" number of deaths from pre-screening diagnoses. This leaves them with a clean, pure number of deaths that screening could have influenced.

The Three Methods Explained

The paper compares three ways to do this math:

Method I (The Simple Estimate): A rough guess. It's like looking at the basket and eyeballing the fruit weight. It works, but it's a bit wobbly.
Method II (The Regression with Offsets): This is the Recommended Method. It uses the "Time Machine" data to create that mathematical filter (the offset) and runs a standard computer program to get a very precise answer. It uses all the data available, making the result much sharper and more reliable.
Method III (Maximum Likelihood): The "Super-Computer" version. It tries to solve the problem using the most complex, perfect mathematical theory possible.
- The Catch: It's so complex that it's hard to program and often crashes the computer. The authors found that Method II gives almost the exact same result as Method III but is much easier to use.

Why Does This Matter? (The Results)

When the authors tested this new method on real data from Norway and Denmark:

Old Methods: The results were "fuzzy." The confidence intervals (the margin of error) were wide. It was hard to say, "Yes, screening definitely works," because the data was too noisy.
New Method: The results were "crystal clear." The margin of error shrank by 46% to 63%.

The Analogy:
Imagine you are trying to hear a whisper in a noisy room.

The Old Method was like putting a hand over one ear and guessing. You could hear something, but you weren't sure.
The New Method is like putting on high-tech noise-canceling headphones that filter out the specific background noise of "pre-screening deaths." Suddenly, the whisper (the true benefit of screening) is loud and clear.

The Takeaway

This paper isn't just about math; it's about fairness and accuracy.

By using "Risk Time Splitting," we stop throwing away data and stop getting confused by people who were diagnosed too early to benefit from screening. This allows doctors and policymakers to see the true value of screening programs.

In short: We found a way to separate the "old cases" from the "new cases" so we can finally know for sure if screening is saving lives, without the fog of confusion. The authors recommend using Method II (the Regression with Offsets) because it's the perfect balance of being highly accurate and easy for other scientists to use.

Here is a detailed technical summary of the paper "Risk time splitting for improved estimation of screening programs' effect on later mortality" by Weedon-Fekjaer, Lynge, and Keiding.

1. Problem Statement

Evaluating the effectiveness of population-wide cancer screening programs (e.g., mammography) is statistically challenging due to delayed screening effects.

The Dilution Problem: Screening only benefits individuals diagnosed after the screening invitation (incident cases). However, in population mortality data, deaths are a mix of:
1. Cases diagnosed before screening introduction (no potential screening benefit).
2. Cases diagnosed after screening introduction (potential screening benefit).
Limitations of Existing Methods:
- General Mortality Trends: Analyzing overall mortality without splitting cases dilutes the screening effect, often hiding it entirely.
- Classical Refined Mortality: Previous studies used "refined mortality" (splitting cases by diagnosis time) but relied on selected comparison groups (e.g., matching specific counties or time periods). While valid, this approach discards large portions of available data, resulting in low statistical precision and wide confidence intervals.
- Lead-time Bias: Ad-hoc adjustments for lead time often fail to account for the complex, gradual introduction of screening programs.

2. Methodology

The paper proposes and compares three estimation methods to utilize all available data while correcting for the "lag" between diagnosis and death. The core concept is Risk Time Splitting, which separates mortality based on whether the underlying case was incident before or after the screening invitation.

Method I: Simplified Standardized Mortality Ratio (SMR)

Approach: A three-step intuitive approximation.
1. Estimate expected mortality in the post-screening period using pre-screening data (Age-Period-Cohort-Poisson regression).
2. Calculate the proportion of deaths ( $\rho$ ) expected to be based on pre-screening diagnoses using historic lag data (time from diagnosis to death).
3. Adjust the expected mortality by subtracting the pre-screening proportion to isolate the "new" incident cases.
Output: A rate ratio comparing observed vs. adjusted expected mortality.
Limitation: Less stable than regression methods; does not fully integrate the data into a single likelihood framework.

Method II: Refined Mortality Regression with Offsets (Recommended)

Approach: A Poisson regression model that incorporates offsets to align rates.
- Data Structure: Mortality data is split into three groups:
  1. Pre-screening era.
  2. Post-screening era, cases diagnosed before invitation (Old cases).
  3. Post-screening era, cases diagnosed after invitation (New cases).
- Model: The expected rate for "New" cases is modeled as:
  $\text{Rate}_{new} = \exp(\text{Age} + \text{Period} + \text{Cohort} + \text{Region} + \text{ScreeningEffect}) \times (1 - \rho)$
  Where $\rho$ is the estimated proportion of deaths attributable to pre-screening diagnoses (derived from historic lag data).
- Offsets: The term $(1-\rho)$ is applied as a Poisson regression offset. This effectively "scales" the expected number of cases to account for the fact that not all deaths in the post-screening period are due to new, screen-detectable cases.
Advantage: Uses all data points, handles gradual screening rollouts, and is easily implementable in standard software (R/Python).

Method III: Full Maximum Likelihood Estimation (MLE)

Approach: Derives a joint likelihood function combining:
1. The likelihood of the population mortality data (Poisson).
2. The likelihood of the historic lag data (Multinomial).
Mechanism: Instead of using pre-calculated $\rho$ estimates as fixed offsets, the lag parameters are estimated simultaneously with the screening effect parameters.
Challenge: Numerically intensive; prone to optimization failures (underflow) without careful starting values (often initialized using Method II estimates).

3. Key Contributions

Methodological Transparency: The paper provides a complete, accessible explanation of the method previously hidden in a web appendix (Weedon-Fekjær et al., BMJ 2014), including code for R and Python.
Data Efficiency: Demonstrates that using all available data (rather than selected subsets) significantly improves statistical precision without introducing strong new modeling assumptions.
Refined Mortality Implementation: Formalizes the use of Poisson offsets to adjust for the "risk time" (the delay between diagnosis and death), allowing for a clean separation of pre- and post-screening incident cases within a single regression framework.
Comparative Analysis: Systematically compares the three methods (SMR, Regression with Offsets, and MLE) and validates them against real-world data.

4. Results

The methods were applied to Norwegian (gradual rollout 1995–2005) and Danish (staggered rollout 1991–2010) breast cancer screening data.

Screening Effect Estimates:
- Non-refined (Diluted) Analysis: Estimated a small effect (Rate Ratio ~0.86–0.94), underestimating the true benefit.
- Refined Methods (I, II, III): All yielded consistent estimates of a stronger screening effect (Rate Ratios ~0.72–0.81), indicating a 19–28% reduction in mortality for new incident cases.
- Consistency: Method II (Regression) and Method III (MLE) produced nearly identical point estimates, confirming the validity of the offset approach.
Statistical Precision (Confidence Intervals):
- The new methods significantly reduced uncertainty compared to classical "selected comparison group" studies.
- Norwegian Data: Confidence interval widths were reduced by 46% to 63%.
- Danish Data: Confidence interval widths were reduced by 15%.
- Significance: The narrower intervals are particularly crucial for programs with gradual rollouts (like Norway), where traditional methods struggle due to data fragmentation.

5. Significance and Recommendations

Clinical & Policy Impact: The improved precision allows for more reliable monitoring of screening programs, especially as treatment regimes evolve. It helps distinguish whether a screening program adds value on top of modern treatments.
Recommendation: The authors recommend Method II (Refined Mortality Regression with Offsets) as the standard approach.
- It offers the statistical robustness of Maximum Likelihood.
- It is computationally stable and easy to implement in standard statistical software.
- It avoids the numerical difficulties of full MLE while utilizing the full dataset.
Broader Applicability: While demonstrated on breast cancer, the "risk time splitting" framework is applicable to any screening or intervention program where the effect is delayed and the population is exposed gradually.

Conclusion: By shifting from selecting subsets of data to modeling the entire population with risk-time offsets, this paper provides a superior statistical framework for evaluating the causal impact of screening programs on mortality, resolving the trade-off between validity and precision.