Bayesian generative modeling for heterogeneous wastewater data applied to COVID-19 forecasting

⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to predict how many people will get sick and end up in the hospital next month. Usually, doctors and scientists look at the number of people currently walking into the emergency room to make these guesses. It's like trying to predict a storm by looking at the rain already hitting the ground.

But what if you could look at the clouds before the rain starts? That's the idea behind wastewater surveillance.

This paper is about a team of scientists who built a "weather forecast" for COVID-19 hospital visits. They wanted to see if looking at the sewage system (where the virus washes down from homes) could help them predict hospital visits better than just looking at the hospital numbers alone.

Here is the story of their experiment, explained simply:

1. The Two "Weather Stations"

The scientists built a computer model that acts like a super-smart detective. They gave it two types of clues:

Clue A (The Hospital Data): The number of people actually getting admitted to the hospital. This is reliable, but it's slow. It's like seeing the rain after it has already soaked your shoes.
Clue B (The Sewage Data): The amount of virus particles found in the wastewater of a city. This is a "leading indicator." People shed the virus in their poop before they even feel sick or go to the doctor. This is like seeing the dark clouds gathering before the first drop falls.

The team wanted to know: Does adding the "cloud" data (sewage) make the "rain" prediction (hospital visits) more accurate?

2. The Experiment: A Real-Time Test

From February to April 2024, the team ran their model in "live mode." Every week, they sent their predictions to the U.S. COVID-19 Forecast Hub, a giant competition where dozens of different teams try to predict the future of the virus.

They submitted two versions of their prediction:

The "Sewage-Savvy" Version: Used both hospital data and sewage data.
The "Hospital-Only" Version: Used only hospital data.

3. The Results: A Surprising Tie

You might expect that having more clues (sewage + hospitals) would always make the prediction better. But the results were a bit like a coin toss.

Overall, it was a tie. When they looked at the average performance across the whole country, the model with sewage data performed almost exactly the same as the model without it. In fact, the "Hospital-Only" model was slightly better in the real-time competition, ranking 2nd out of 10 teams, while the "Sewage-Savvy" model ranked 4th.
But, it wasn't a tie everywhere. This is where it gets interesting.
- The "Superhero" Moments: In some places (like California in their examples), the sewage data was a superhero. It saw the virus dropping off before the hospital numbers did, allowing the model to correctly predict a calm period.
- The "Confused" Moments: In other places (like Ohio and Illinois), the sewage data was a liar. Heavy rain washed the sewage pipes, diluting the virus concentration and making it look like the virus was disappearing. The model got tricked by this "rainy day" signal and predicted a drop in hospital visits that didn't happen.

4. Why Did the Sewage Data Sometimes Fail?

The scientists realized that sewage isn't a perfect crystal ball. It's messy.

The "Rain Dilution" Problem: If it rains a lot, the sewage gets watery, and the virus looks less concentrated, even if the same number of people are sick. The model didn't always know the difference between "fewer sick people" and "just a lot of rain."
The "Echo Chamber" Problem: Sometimes, the sewage sensors in a city were all saying the same thing because they were close to each other. The model got too confident in this single voice, ignoring the fact that it might be wrong. It's like asking five friends who live in the same house what the weather is outside; they will all say the same thing, but they might all be wrong if they haven't looked out the window.

5. The Big Takeaway

The main lesson from this paper is that more data doesn't always mean better predictions.

Think of it like cooking. If you are making a soup, adding a pinch of salt (sewage data) might make it perfect. But if you add a whole bucket of salt because you think "more is better," you ruin the soup.

When it worked: The sewage data helped the model see the future clearly, especially when the virus was changing direction quickly.
When it failed: The sewage data introduced "noise" (like rain or lab errors) that confused the model, making it less accurate than if it had just stuck to the hospital numbers.

Conclusion

The scientists concluded that wastewater is a powerful tool, but it's not a magic wand. It needs to be used carefully. In the future, they hope to build "smarter" models that can tell the difference between a real drop in virus levels and a fake drop caused by rain or other factors.

For now, the best forecasters are those who know when to listen to the sewage pipes and when to just listen to the hospital waiting room.

1. Problem Statement

Infectious disease forecasting is critical for public health decision-making (e.g., resource allocation, policy changes). While traditional models rely on clinical data (e.g., hospital admissions, case counts), these data sources often suffer from reporting delays, changes in testing policies, and under-reporting of asymptomatic cases.

The Opportunity: Wastewater-based epidemiology (WBE) offers a passive, continuous, and potentially leading indicator of infection prevalence.
The Challenge: Integrating wastewater data into forecasting models is difficult due to:
- Heterogeneity: Data comes from multiple sites with varying collection frequencies, laboratory processing methods, and reporting latencies.
- Spatial Mismatch: Wastewater catchment areas rarely align perfectly with healthcare reporting regions (states/territories).
- Lack of Evaluation: Few studies have rigorously evaluated whether adding wastewater data actually improves forecast performance compared to models using clinical data alone, particularly in real-time settings across diverse geographies.

2. Methodology

A. Model Architecture: Hierarchical Bayesian Renewal Model

The authors developed a semi-mechanistic generative model implemented in Stan (using the cmdstanr interface) and packaged as an open-source R library, wwinference.

Core Mechanism: The model uses a renewal equation to infer latent infection dynamics ( $I_t$ ) from observed data.
Subpopulation Structure: To handle the mismatch between wastewater sites and jurisdictions, the total population of a jurisdiction is divided into $K$ $K$ non-overlapping subpopulations:
- $K_{sites}$ subpopulations representing the catchment areas of individual wastewater treatment plants (WWTPs).
- 1 "reference" subpopulation representing the remainder of the jurisdiction not covered by wastewater surveillance.
Effective Reproductive Number ( $R_t$ ): $R_t$ is modeled hierarchically. Each subpopulation's $R_t$ is estimated as a deviation from the reference subpopulation's $R_t$ , allowing for partial pooling of information across sites.
Data Generation:
- Wastewater: Observed viral concentrations are modeled as a function of the latent infections in the specific subpopulation, accounting for shedding rates, dilution, and limits of detection.
- Hospital Admissions: Observed admissions are modeled as a function of total jurisdiction-level latent infections, convolved with a delay distribution (infection-to-admission) and an admission probability.
Inference: The model uses Markov Chain Monte Carlo (MCMC) with a No-U-Turn Sampler (NUTS) to fit the posterior distribution.

B. Data Sources

Hospital Admissions: Daily incident COVID-19 hospital admissions from the US National Healthcare Safety Network (NHSN) for 52 jurisdictions (50 states, DC, Puerto Rico).
Wastewater: SARS-CoV-2 RNA concentrations from the CDC's National Wastewater Surveillance System (NWSS), including metadata on collection sites, labs, and population coverage.
Vintaged Data: The authors created "vintaged" datasets (snapshots of data as it existed on specific past dates) to ensure retrospective forecasts used only information available in real-time, preventing look-ahead bias.

C. Evaluation Framework

The study employed a dual-evaluation strategy:

Real-Time (Feb–Apr 2024): Forecasts were submitted to the US COVID-19 Forecast Hub. The model was compared against 8 other individual models and the Hub ensemble.
Retrospective (Oct 2023–Mar 2024): The model was run retrospectively for the entire 2023-2024 winter wave.
- Head-to-Head Comparison: The "wastewater-informed" model was compared directly against the "hospital admissions-only" version of the same model architecture.
- Metrics: Continuous Ranked Probability Score (CRPS), Weighted Interval Score (WIS), quantile coverage, and bias.

3. Key Contributions

First Comprehensive US-Wide Evaluation: This is the first study to evaluate real-time and retrospective infectious disease forecasts across the entire US both with and without wastewater data, comparing them against a broad set of other forecasting models.
Open-Source Implementation: The release of wwinference, a flexible R package that allows users to integrate multi-site wastewater data with clinical data for various pathogens and geographies.
Heterogeneity Analysis: Instead of just reporting aggregate performance, the authors conducted an exploratory analysis to identify when and where wastewater data helps or hurts, linking performance to data characteristics (e.g., correlation between sites, trend alignment).

4. Results

A. Overall Forecast Performance

Real-Time (Feb–Apr 2024):
- The wastewater-informed model ranked 4th out of 10 individual models submitted to the Hub.
- The hospital admissions-only model (hypothetically submitted) would have ranked 2nd.
- Both models performed similarly to the Hub ensemble but were slightly outperformed by the top-performing individual models (e.g., UMass-sarix).
Retrospective (2023-24 Season):
- Both models performed nearly identically in aggregate. The wastewater-informed model ranked 5th, and the admissions-only model ranked 4th out of 10.
- Head-to-Head: The relative CRPS (rCRPS) between the two models was 1.01, indicating no significant overall improvement from adding wastewater data.

B. Heterogeneity of Performance

While aggregate performance was similar, there was significant variability at the jurisdiction and time-step level:

Improvement: In some cases (e.g., California in Jan 2024), wastewater data showed a downward trend before hospital admissions did, allowing the model to correctly forecast a decline that the admissions-only model missed.
Degradation: In other cases (e.g., Illinois/Ohio in Feb/Mar 2024), wastewater data showed a "divot" (rapid decline followed by rebound) likely caused by rainfall dilution in combined sewer systems. The model interpreted this as a drop in infections, leading to severe under-prediction of hospital admissions.
Drivers of Variability:
- Overconfidence: The model tended to be overconfident when multiple wastewater sites showed highly correlated trends, leading to extreme forecasts that failed when those trends were driven by non-epidemiological factors (e.g., weather).
- Signal Misalignment: When wastewater trends diverged from hospital admission trends (due to noise or external factors), the model struggled to appropriately weight the conflicting signals.

C. Calibration

Both models exhibited a slight upward bias (over-prediction) during peak periods.
The wastewater-informed model was generally better calibrated at higher predictive quantiles but worse at lower quantiles compared to the admissions-only model.

5. Significance and Future Directions

Value of Wastewater Data: The study concludes that while wastewater data can improve forecasts (especially as a leading indicator), it does not guarantee better performance. Its utility is highly context-dependent.
Model Limitations: The current model assumes wastewater sites are independent random samples and does not account for:
- Spatial correlations between sites.
- Extrinsic factors affecting wastewater (rainfall, industrial input, animal waste).
- Variability in shedding rates or population demographics.
Future Work:
- Model Structure: Incorporate realistic correlation structures (e.g., based on geography or mobility) and explicit modeling of extrinsic noise (e.g., rainfall effects).
- Data Quality: Investigate the impact of reporting lags and sample frequency on forecast accuracy.
- Infrastructure: The authors emphasize the need for broader access to vintaged datasets to enable rigorous, unbiased testing of forecasting models.

Conclusion: This work demonstrates that integrating heterogeneous wastewater data into Bayesian generative models is feasible and can yield high-performing forecasts. However, the added value is not uniform; it requires careful handling of data heterogeneity and noise to avoid degrading forecast performance during critical periods.