Hierarchical Inference and Closure Learning via Adaptive Surrogates for ODEs and PDEs

Imagine you are a detective trying to solve a mystery, but you don't have the full picture. You have a bunch of similar crime scenes (physical systems), and you know the general rules of how the world works (the laws of physics), but you are missing two crucial pieces of the puzzle:

The Specifics: Every crime scene has unique details (like the weight of a specific object or the initial speed of a car).
The Missing Rule: There is a hidden, complicated law of nature that you don't know yet (like exactly how friction works in a specific type of engine).

This paper presents a clever new way for scientists and engineers to solve these mysteries simultaneously. They call it Hierarchical Inference and Closure Learning.

Here is a simple breakdown of how it works, using some creative analogies:

1. The "Classroom" Analogy (Hierarchical Inference)

Imagine a classroom of 20 students (the 20 different physical systems).

The Old Way: If you wanted to know how smart each student is, you would test them one by one, completely ignoring the others. If a student has bad test data (noise), you might get a wrong answer.
The New Way (Hierarchical): You realize these students are all in the same class. They share a common "population" intelligence. If Student A has messy data, you can look at Student B and Student C to help guess what Student A's true intelligence is.
The Result: By looking at the whole group together, you get a much more accurate estimate of each individual's specific traits, even if their personal data is a bit fuzzy.

2. The "Ghost in the Machine" (Closure Learning)

Now, imagine you know the equation for how a car moves, but you don't know exactly how the engine vibrates when it gets hot. That missing vibration rule is the "Closure."

Instead of trying to write a complex math formula for this vibration (which is hard), the researchers use a Neural Network (a type of AI).
Think of the Neural Network as a shape-shifting clay model. It starts as a lump of clay. As the researchers feed it data from the 20 students, the clay slowly molds itself into the exact shape of the missing vibration rule.
The AI learns this rule by trial and error, trying to make the math match the real-world observations.

3. The "Speeding Ticket" Problem (The Computational Bottleneck)

Here is the big problem: To figure out the missing rule and the specific details, the computer has to run the physics simulation millions of times.

The Analogy: Imagine you are trying to find the perfect recipe for a cake. Every time you change an ingredient, you have to bake the cake, wait 45 minutes for it to cool, taste it, and then write down the result. If you need to do this 10,000 times, you will never finish.
The Solution (Surrogates): The researchers build a Fast-Forward Simulator (a Surrogate Model).
- Instead of baking the real cake (running the slow, expensive physics simulation), they use a Magic Crystal Ball (the Surrogate).
- The Crystal Ball predicts what the cake will taste like in 0.01 seconds.
- Crucially, they train this Crystal Ball while they are solving the mystery. It learns to mimic the real physics so well that it becomes a perfect stand-in, saving them years of computing time.

4. The "Two-Level Dance" (Bilevel Optimization)

The whole process is a synchronized dance between two goals:

The Detective's Goal: "I need to find the specific details of these 20 students." (This uses a method called MALA, which is like a smart random walk that explores all possibilities).
The Teacher's Goal: "I need to teach the Crystal Ball to be a better predictor."

They do this in a loop:

The Detective takes a step to guess the details.
The Teacher uses that guess to train the Crystal Ball to be more accurate.
The Detective uses the new Crystal Ball to take a better step.
They keep dancing back and forth until they have solved the mystery and built a perfect Crystal Ball.

Why is this a big deal?

It handles the unknown: It doesn't just guess numbers; it learns entire missing laws of physics.
It handles the messy: It works even when the data is noisy or incomplete, by using the "classroom" effect to help each other out.
It's fast: By using the "Magic Crystal Ball" (Surrogate), it solves problems that would usually take weeks in just hours.

In summary: This paper gives scientists a super-tool to figure out the hidden rules of nature and the specific details of complex systems, even when they don't have perfect data, by letting a group of systems teach each other and using a fast AI assistant to speed up the math.

Here is a detailed technical summary of the paper "Hierarchical Inference and Closure Learning via Adaptive Surrogates for ODEs and PDEs."

1. Problem Statement

The paper addresses inverse problems in physical systems governed by Ordinary Differential Equations (ODEs) and Partial Differential Equations (PDEs). In many real-world engineering applications, the governing equations are only partially known:

Unknown Parameters: Specific system properties (e.g., material stiffness, initial conditions, viscosity) vary across different instances of a system and are unknown.
Unknown Closures: The mathematical form of certain nonlinear dynamics (e.g., friction laws, turbulence models, damping effects) is missing or too complex to derive from first principles.

The challenge is twofold:

Joint Estimation: Simultaneously infer the specific parameters ( $\theta^{(k)}$ ) for multiple distinct but related physical systems ( $k=1, \dots, K$ ) and learn the shared unknown nonlinear closure function ( $f$ ).
Computational Bottleneck: Solving these inverse problems typically requires repeated evaluations of expensive numerical forward solvers (e.g., Finite Element Methods, Runge-Kutta). When combined with Bayesian inference (which requires sampling), this becomes computationally prohibitive, especially for high-dimensional or time-dependent systems.

2. Methodology

The authors propose a hierarchical Bayesian framework coupled with a bilevel optimization strategy that integrates surrogate modeling directly into the inference loop.

A. Hierarchical Bayesian Formulation

Instead of treating each system independently, the method assumes the systems belong to a common population.

System Parameters ( $\theta^{(k)}$ ): Modeled as being drawn from a population distribution governed by hyperparameters $\phi$ (e.g., a Gaussian prior $p(\theta^{(k)}|\phi)$ ).
Shared Closure ( $f$ ): A single unknown nonlinear function shared across all systems, approximated by a neural network $f_\alpha$ .
Goal: Infer the posterior distribution of system-specific parameters and hyperparameters while maximizing the marginal likelihood to learn the closure parameters $\alpha$ .

B. Hybrid Inference Strategy

The method decouples the inference tasks based on dimensionality and complexity:

Probabilistic Sampling (Low-Dimensional): Uses Ensemble Metropolis-Adjusted Langevin Algorithm (MALA) to sample the posterior distribution of the low-dimensional parameters ( $\theta^{(k)}$ and $\phi$ ). Ensemble MALA uses interacting chains and adaptive covariance preconditioning to ensure stable and efficient sampling.
Deterministic Learning (High-Dimensional): Uses Maximum Marginal Likelihood Estimation (MMLE) to train the neural network parameters ( $\alpha$ ) representing the closure. The gradient of the marginal likelihood is approximated using samples from the MALA chains (Fisher's identity).

C. Bilevel Optimization with Adaptive Surrogates

To overcome the computational cost of numerical solvers, the framework introduces a differentiable neural surrogate ( $F_\beta$ ) trained jointly with the inverse problem.

Upper Level: Optimizes the closure parameters $\alpha$ (and infers $\theta, \phi$ ) by maximizing the marginal likelihood.
Lower Level: Optimizes the surrogate parameters $\beta$ to approximate the forward operator $F^\dagger(\theta, \alpha)$ .
Joint Training: The surrogate is not pre-trained offline. Instead, it is updated iteratively within the bilevel loop. The surrogate is trained on samples generated by the MALA chains, ensuring it remains accurate in the high-probability regions of the parameter space.
Surrogate Architectures: The paper evaluates Fourier Neural Operators (FNO) and Physics-Informed Neural Networks (PINNs) as surrogates.

3. Key Contributions

Hybrid Framework (C1): A novel approach combining hierarchical Bayesian inference for parameters with deterministic neural learning for unknown closures. This balances statistical rigor (uncertainty quantification) with computational efficiency.
Iterative Sampling-Learning Scheme (C2): An algorithm that alternates between ensemble MALA sampling and closure model updates. The samples from the ensemble are used to approximate gradients for the closure learning, creating a feedback loop.
Surrogate-Accelerated Inference (C3): A bilevel optimization strategy where a differentiable surrogate is trained concurrently with the inverse problem. This replaces expensive numerical solvers, drastically reducing computational cost while maintaining differentiability for gradient-based sampling.
Comprehensive Validation (C4): Extensive experiments on three distinct physical problems:
- Nonlinear mass-spring-damper system (ODE).
- Nonlinear 2D Darcy flow (PDE).
- Generalized Burgers' equation (PDE).

4. Experimental Results

The framework was tested on varying numbers of systems ( $K$ ) and compared against baselines (numerical solvers, non-hierarchical approaches, and different surrogate types).

Accuracy & Uncertainty Quantification:
- The hierarchical approach significantly outperformed non-hierarchical methods, especially when data was sparse ( $K$ is small). It successfully "borrowed strength" from the population to reduce parameter uncertainty and improve convergence.
- Supervised FNO generally provided the most accurate closure learning and surrogate predictions, particularly in complex PDE settings (Darcy flow, Burgers').
- PINNs offered a strong balance between accuracy and computational efficiency, performing competitively in ODE settings and scaling well with $K$ .
- Physics-based FNO (trained only on residuals without supervised data) struggled in 2D PDE settings, showing higher errors and lower coverage compared to supervised approaches.
Computational Efficiency:
- The surrogate-accelerated methods were orders of magnitude faster than using numerical solvers directly within the MALA loop.
- PINNs emerged as the most computationally efficient architecture, with runtimes remaining nearly constant as $K$ increased, whereas solver-based and FNO-based methods saw significant time increases.
- Supervised FNO, while more accurate, incurred higher costs due to the need for reference trajectories during training.
Hyperparameter Recovery:
- The method successfully recovered population-level hyperparameters ( $\mu_\phi, \tau_\phi$ ), providing data-driven priors for future systems without arbitrary selection.

5. Significance

This work provides a robust solution for partially known physical systems, a common scenario in engineering where first-principles models exist but are incomplete.

Scalability: By integrating surrogate learning directly into the inference loop, it makes Bayesian inverse problems feasible for complex, high-dimensional PDEs that were previously computationally intractable.
Data Efficiency: The hierarchical structure allows for accurate inference even with limited data per system by leveraging population statistics.
Flexibility: The framework is agnostic to the specific type of differential equation (ODE/PDE) and can accommodate various surrogate architectures (FNO, PINN), making it a versatile tool for scientific machine learning.

In summary, the paper establishes a flexible pathway for simultaneous parameter estimation and closure discovery, effectively coupling probabilistic inference, neural network learning, and adaptive surrogate modeling to solve complex inverse problems.