Asymptotics of cut distributions and robust modular inference using Posterior Bootstrap

Imagine you are trying to solve a massive, complex puzzle, like figuring out why a specific city has high rates of a certain disease. You have two different pieces of information:

The Weather Data: How much pollution is in the air.
The Health Data: How many people are getting sick.

In a perfect world, you would combine these two pieces of information into one giant, super-smart computer model that learns everything at once. This is the "Standard Bayesian" approach. It's like hiring a single genius detective who looks at the weather and the health records simultaneously to find the answer.

The Problem: The "Bad Neighbor" Effect
But what if your "Weather Data" is actually a bit flawed? Maybe the sensors are broken, or the model for pollution is wrong. In the standard approach, this bad information doesn't stay in the weather section. It "leaks" into the health section. The genius detective gets confused by the bad weather data and starts giving you wrong answers about the disease, even though the health data itself was perfect.

This is called Model Misspecification. One broken part ruins the whole machine.

The Solution: The "Cut" (Modular Inference)
To fix this, the authors propose a strategy called Modular Inference or "Cutting Feedback."

Imagine you have two separate detectives:

Detective A looks only at the weather data to figure out the pollution levels.
Detective B looks only at the health data to figure out the disease rates.

In a standard model, Detective B would ask Detective A, "Hey, what did you find?" and then adjust their own theory based on that. But if Detective A is wrong, Detective B gets misled.

In this new "Cut" approach, we put a soundproof wall between them.

Detective A does their job and gives their answer.
Detective B takes Detective A's answer and uses it as a fixed fact.
Crucially: Detective B is forbidden from sending any information back to Detective A. If Detective B realizes the pollution levels don't make sense with the health data, they can't go back and tell Detective A to change their mind. They just have to work with the "best guess" they were given.

This prevents the bad weather data from corrupting the health analysis. It's like saying, "We trust your weather report for now, but if it's wrong, we won't let it ruin our medical diagnosis."

The Paper's Big Contributions

The authors didn't just say "this is a good idea"; they did the heavy math to prove why it works and how to do it efficiently.

The "Asymptotic" Proof (The Long-Term Promise):
They proved mathematically that as you get more and more data, this "Cut" method behaves very predictably. It's like proving that if you keep flipping a coin enough times, the ratio of heads to tails will eventually settle down to a known number. They showed that even with the "soundproof wall," the final answer is statistically reliable and won't drift off into nonsense.
The "Laplace Approximation" (The Shortcut):
Calculating the exact answer with the "Cut" method is computationally expensive, like trying to solve a Rubik's cube blindfolded. The authors developed a clever shortcut (a "Laplace approximation") that gives you a very close answer much faster. It's like using a GPS to get a "good enough" route instead of calculating every possible turn manually. They also proved exactly how much error this shortcut introduces.
The "Posterior Bootstrap" (The New Tool):
This is their most exciting new tool. Imagine you want to know the range of possible answers, not just the single best guess. Usually, this requires running a super-slow computer simulation.
The authors created a method called Posterior Bootstrap for Modular Inference (PBMI).
- How it works: Instead of a slow simulation, it uses a "weighted lottery." It randomly picks different weights for your data points, solves the puzzle a thousand times quickly, and sees what the results look like.
- Why it's great: It's fast, it handles the "soundproof wall" perfectly, and—most importantly—it gives you honest confidence intervals. If you say "I'm 95% sure the answer is between X and Y," this method actually guarantees that you are right 95% of the time in the real world.

Real-World Examples
The paper tests this on things like:

Causal Inference: Figuring out if a job training program actually helps people earn more money, without letting bad data about who got selected for the program skew the results.
Epidemiology: Studying the link between HPV and cervical cancer using data from different countries, where one country's data might be messy or biased.

The Takeaway
This paper is a toolkit for statisticians who are tired of their models breaking when one part of the data is imperfect. It gives them:

A way to isolate bad data so it doesn't poison the whole analysis.
A mathematical guarantee that this isolation works in the long run.
Fast, practical algorithms (like the Bootstrap) to actually do the work without waiting days for a computer to finish.

In short: It's about building robust, modular systems that can handle messy, real-world data without falling apart.

Here is a detailed technical summary of the paper "Asymptotics of cut distributions and robust modular inference using Posterior Bootstrap" by Pompe, Kasprzak, and Jacob.

1. Problem Statement

Bayesian inference typically combines all model components into a single joint posterior distribution. However, if any part of the model is misspecified (e.g., a specific likelihood function is incorrect), this error can propagate through the shared parameters, contaminating the inference for the entire system.

To address this, Modular Inference (or "cut" inference) has been proposed. In a modular setup, information flows from one module to another but is prevented from flowing back. For example, in causal inference with propensity scores, the first module estimates the propensity scores without using the outcome data, and the second module estimates the treatment effect using those fixed scores.

The Challenge: While practitioners use modular approaches to mitigate misspecification, the theoretical properties of cut posterior distributions are not fully understood. Specifically, there is a lack of rigorous asymptotic theory (Bernstein-von Mises theorems) and computationally efficient methods with guaranteed frequentist coverage for these distributions.
The Gap: Existing methods for computing cut posteriors (like nested MCMC) are computationally expensive and often intractable due to the "feedback term" (an intractable integral). Furthermore, standard Bayesian credible intervals derived from cut posteriors may not have correct frequentist coverage under model misspecification.

2. Methodology and Framework

The paper considers a two-module model with parameters $\theta_1$ and $\theta_2$ and datasets $x_1$ and $x_2$ .

Standard Joint Posterior: $\pi(\theta_1, \theta_2 | x_1, x_2) \propto \pi_1(\theta_1)L_1(x_1|\theta_1)\pi_2(\theta_2)L_2(x_2|\theta_1, \theta_2, x_1)$ .
Cut Posterior: The feedback term is removed. The inference on $\theta_1$ depends only on $x_1$ , while $\theta_2$ depends on $x_2$ and the inferred $\theta_1$ .
$\pi_{cut}(\theta_1, \theta_2) \propto \pi(\theta_1|x_1) \pi(\theta_2|\theta_1, x_1, x_2)$
where $\pi(\theta_1|x_1)$ is the standard posterior for the first module, and the second term is the conditional posterior of the second module given $\theta_1$ .

The authors develop three main methodological pillars:

Asymptotic Theory: Deriving the limiting distribution of the cut posterior.
Laplace Approximation (Cut-Laplace): A deterministic approximation method.
Posterior Bootstrap for Modular Inference (PBMI): A stochastic sampling algorithm.

3. Key Contributions

A. Bernstein-von Mises (BvM) Theorem for Cut Posteriors

The authors establish a BvM theorem for cut posteriors under model misspecification.

Result: As the sample size $n \to \infty$ , the cut posterior concentrates around the Two-Step M-estimator (2SM) $(\hat{\theta}_1, \hat{\theta}_2)$ .
Asymptotic Distribution: The distribution of $\sqrt{n}(\theta - \hat{\theta})$ converges to a multivariate Normal distribution $\mathcal{N}(0, H^{-1})$ .
Variance Structure: The asymptotic covariance matrix $H^{-1}$ $H^{- 1}$ is explicitly derived. Crucially, it differs from the covariance of the standard 2SM estimator ( $\Sigma$ $Σ$ ) when the model is misspecified.
- The marginal variance of $\theta_1$ is $(J_1^*)^{-1}$ , which is robust to misspecification in the second module.
- The variance of $\theta_2$ incorporates the uncertainty from estimating $\theta_1$ .
Implication: Credible regions derived from the cut posterior have nominal frequentist coverage for $\theta_1$ (if the first module is correct) but may under- or over-cover for $\theta_2$ or joint parameters if the model is misspecified.

B. Laplace Approximation (Cut-Laplace)

To avoid the intractable integral in the feedback term, the authors propose a Laplace approximation.

Construction: They approximate the cut posterior by a Normal distribution centered at the 2SM estimator with a covariance matrix derived from the Hessian of the log-likelihoods, ignoring the feedback term's integral complexity.
Error Bounds: The paper provides non-asymptotic bounds on the Total Variation (TV) distance between the true cut posterior and the Cut-Laplace approximation. The error converges at a rate of $O(n^{-1/2})$ .
Significance: This provides a computationally cheap alternative to MCMC with theoretical guarantees on approximation quality.

C. Posterior Bootstrap for Modular Inference (PBMI)

The authors propose PBMI, an algorithm based on the Weighted Likelihood Bootstrap (WLB).

Algorithm:
1. Draw random weights $w_j \sim \text{Exp}(1)$ .
2. Optimize the weighted log-posterior for Module 1 to get $\tilde{\theta}_1^{(k)}$ .
3. Optimize the weighted log-posterior for Module 2 (using $\tilde{\theta}_1^{(k)}$ ) to get $\tilde{\theta}_2^{(k)}$ .
4. Repeat to generate a sample.
Key Advantage: Unlike the cut posterior (which has a complex asymptotic variance under misspecification), the PBMI distribution converges to the standard 2SM estimator's asymptotic variance ( $\Sigma$ ).
Frequentist Coverage: Consequently, PBMI provides nominal frequentist coverage for confidence intervals even under model misspecification, making it a robust tool for uncertainty quantification in modular settings.

4. Key Results and Theoretical Findings

Divergence of Variances: The paper proves that under misspecification, the asymptotic variance of the cut posterior ( $H^{-1}$ $H^{- 1}$ ) is generally different from the variance of the 2SM estimator ( $\Sigma$ $Σ$ ).
- If $I_1^* = J_1^*$ and $R_I^* = 0$ (no correlation between modules or correct specification), the variances match.
- Otherwise, the cut posterior may yield narrower or wider credible intervals than the frequentist confidence intervals.
Coverage Properties:
- Cut Posterior: Provides correct coverage for $\theta_1$ (robust to second-module misspecification) but potentially incorrect coverage for $\theta_2$ or joint parameters.
- PBMI: Provides correct frequentist coverage for all parameters by mimicking the 2SM estimator's behavior.
Prediction Performance: The authors analyze the predictive risk (KL divergence) for the second module. They show that neither method is universally superior; the choice depends on the specific nature of the misspecification and the data-generating process.
Numerical Illustrations:
- Toy Example: Demonstrates that Cut-Laplace and PBMI yield similar results when modules are independent, but PBMI reflects higher uncertainty (and correct coverage) when modules are dependent and misspecified.
- Causal Inference (Propensity Scores): Applied to the Lalonde dataset. The second module (outcome regression) involves a discontinuous likelihood due to quintile discretization, making Laplace approximation ill-defined. PBMI successfully handled this, providing viable inference where standard MCMC is difficult.
- Epidemiology (HPV/Cancer): Showed that the cut posterior can be skewed, a feature captured by PBMI but missed by the Normal approximation (Cut-Laplace).

5. Significance and Impact

Theoretical Rigor: This work fills a critical gap in the literature by providing the first rigorous asymptotic theory (BvM) for cut posteriors, clarifying their behavior under misspecification.
Practical Algorithms: It offers two practical solutions:
- Cut-Laplace: For fast, deterministic approximation when the likelihood is smooth.
- PBMI: For robust, frequentist-valid inference, especially in complex settings (discontinuous likelihoods, non-conjugate priors) where MCMC is difficult.
Robustness: The paper validates the use of modular inference not just as a heuristic, but as a statistically sound approach to handling model misspecification, provided the user selects the appropriate inference tool (PBMI for frequentist coverage, Cut Posterior for specific Bayesian belief updates).
Guidance: The authors provide clear guidelines: use PBMI if the goal is frequentist confidence intervals; use the cut posterior (or its approximations) if the goal is to maintain a specific probabilistic interpretation of belief updates conditional on the first module's data.

In summary, the paper bridges the gap between the practical utility of modular Bayesian inference and rigorous statistical theory, offering both theoretical guarantees and computationally feasible algorithms for robust inference in complex, misspecified models.

Asymptotics of cut distributions and robust modular inference using Posterior Bootstrap

1. Problem Statement

2. Methodology and Framework

3. Key Contributions

A. Bernstein-von Mises (BvM) Theorem for Cut Posteriors

B. Laplace Approximation (Cut-Laplace)

C. Posterior Bootstrap for Modular Inference (PBMI)

4. Key Results and Theoretical Findings

5. Significance and Impact

More like this

Modeling extremal dependence in multivariate and spatial problems: a practical perspective

Identifying Treatment Effect Heterogeneity with Bayesian Hierarchical Adjustable Random Partition in Adaptive Enrichment Trials

Comparative e-backtests for general risk measures

Estimating the distance at which narwhal (Monodon monoceros)(\textit{Monodon monoceros})(Monodon monoceros) respond to disturbance: a penalized threshold hidden Markov model

Either a Confidence Interval Covers, or It Doesn't (Or Does It?): A Model-Based View of Ex-Post Coverage Probability

Estimating the distance at which narwhal $(\textit{Monodon monoceros})$ respond to disturbance: a penalized threshold hidden Markov model