Multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation overcome the curse of dimensionality when approximating semilinear parabolic partial differential equations in $L^p$-sense

Imagine you are trying to predict the weather, but instead of just looking at temperature and wind in your city, you have to track the weather for every single atom in the universe simultaneously.

In the world of mathematics and physics, this is the nightmare known as the "Curse of Dimensionality."

When scientists try to solve complex equations (called Partial Differential Equations or PDEs) that describe things like stock market crashes, quantum particles, or fluid dynamics, the number of variables (dimensions) can explode. Traditional computers hit a wall: every time you add one more variable, the time and money needed to solve the equation don't just go up a little; they go up exponentially. It's like trying to find a specific grain of sand on a beach, but every time you add a new beach, the number of grains doubles, then quadruples, until the universe runs out of sand.

This paper, written by Ariel Neufeld and Tuan Anh Nguyen, presents a breakthrough: They have proven that two modern tools can break this curse.

Here is the simple breakdown of their discovery:

1. The Two Heroes: MLPs and Deep Neural Networks

The authors focus on two specific methods that have been getting a lot of hype in the tech world:

Multilevel Picard (MLP) Approximations: Think of this as a "smart guess-and-check" team. Instead of one person trying to solve the whole puzzle alone, they break the problem into layers. They make a rough guess, then a slightly better guess, then an even better one, using a massive team of random simulations (like rolling dice millions of times) to average out the errors.
Deep Neural Networks (DNNs): These are the "brains" behind modern AI (like the chatbots you use). They are mathematical structures inspired by the human brain, made of layers of nodes that learn patterns. The authors specifically tested three types of "activation functions" (the switches that turn neurons on or off): ReLU, Leaky ReLU, and Softplus.

2. The Big Discovery: "Polynomial" vs. "Exponential"

The paper proves a mathematical fact that changes everything:

The Old Way (Curse): If you want to be twice as accurate, the old methods might need $2^{100}$ (a number with 30 zeros) more computing power. This is impossible.
The New Way (The Breakthrough): With MLPs and these specific Neural Networks, if you want to be twice as accurate, the computing power only needs to go up by a polynomial amount (like $2^3$ or $2^4$ ).

The Analogy:
Imagine you are trying to paint a mural.

The Curse: If the mural gets 10% bigger, you need 10,000% more paint and time. You run out of paint before you finish.
The Solution: With the new method, if the mural gets 10% bigger, you only need a little bit more paint. The size of the mural doesn't matter anymore; the method scales efficiently.

3. Why "Lp-Sense" Matters

The paper mentions solving these equations in "Lp-sense."

Simple Translation: Most previous math proofs only guaranteed the method worked for "average" errors (L2). The authors proved it works for all kinds of errors, including the worst-case scenarios (L-infinity).
The Metaphor: Imagine a bridge. Previous proofs said, "This bridge is safe for the average car." This paper says, "This bridge is safe for every car, even the heaviest truck, and we can prove it mathematically." This makes the solution much more robust and reliable for real-world engineering.

4. The "Activation" Secret

The authors specifically looked at ReLU, Leaky ReLU, and Softplus.

ReLU is like a light switch: it's either ON or OFF.
Leaky ReLU is a dimmer switch that never fully turns off (it lets a tiny bit of light through).
Softplus is a smooth curve that mimics a switch but without the sharp edge.

The paper proves that all three of these "switches" work perfectly for solving these high-dimensional problems. This is great news because engineers can choose the switch that fits their specific hardware best without worrying about the math breaking.

5. The Real-World Impact

Why should you care?

Finance: Banks can better price complex options (like betting on a basket of 100 different stocks) without needing a supercomputer the size of a city.
Physics: Scientists can simulate how particles interact in complex systems with thousands of variables, which was previously impossible.
Engineering: Designing safer structures or optimizing energy grids becomes computationally feasible.

The Bottom Line

This paper is the mathematical receipt that proves the "magic" of AI and advanced Monte Carlo simulations isn't just luck. It proves that for a huge class of difficult equations, Deep Learning and Multilevel Picard methods are the keys to unlocking high-dimensional problems.

They have shown that the "Curse of Dimensionality" is not a law of nature, but a limitation of old methods. With these new tools, we can finally solve problems that were previously considered impossible.

1. Problem Statement

The paper addresses the numerical approximation of high-dimensional semilinear parabolic partial differential equations (PDEs), specifically semilinear Kolmogorov equations. These equations are fundamental in fields such as financial engineering, physics, and economics.

The core challenge is the Curse of Dimensionality (CoD). Traditional numerical methods (e.g., finite difference or finite element methods) suffer from computational complexity that grows exponentially with the spatial dimension $d$ of the PDE. The goal is to find approximation schemes where the computational effort grows polynomially in both the dimension $d$ and the reciprocal of the desired accuracy $1/\epsilon$ .

Specifically, this paper targets:

Semilinear PDEs with gradient-independent, Lipschitz-continuous nonlinearities.
$L^p$ -sense approximation for $p \in [2, \infty)$ , extending previous results that were largely limited to the $L^2$ -norm.
Deep Neural Networks (DNNs) utilizing various activation functions: ReLU, Leaky ReLU, and Softplus.

2. Methodology

The authors employ a two-pronged approach combining Multilevel Picard (MLP) approximations and Deep Neural Networks.

A. Multilevel Picard (MLP) Approximations

The authors utilize MLP algorithms, which are based on a full-history recursive Monte Carlo scheme. These algorithms approximate the solution of the PDE by iteratively solving a stochastic fixed-point equation (SFPE) associated with the PDE via the Feynman-Kac formula.

Discretization: The underlying stochastic processes (Itô processes) are discretized using Euler-Maruyama approximations.
Recursive Structure: The MLP estimator $U_{n,m}$ is constructed recursively, where $n$ represents the iteration level and $m$ represents the number of Monte Carlo samples.
Key Innovation for $L^p$ : To prove convergence in $L^p$ (for $p > 2$ ), the authors introduce a specific sequence of sample sizes $M_n = \max\{k \in \mathbb{N} : k \leq \exp(|\ln n|^{1/2})\}$ . This choice ensures that the variance reduction and bias decay rates are sufficient to satisfy the $L^p$ error bounds, a condition not required for the simpler $L^2$ case (where $M_n=n$ suffices).

B. Deep Neural Network (DNN) Representation

The second major component is proving that the MLP approximations can be represented exactly by DNNs.

Activation Functions: The paper extends the representation theory beyond the standard ReLU function to include Leaky ReLU ( $\max\{x, \alpha x\}$ ) and Softplus ( $\ln(1+e^x)$ ).
Composition and Summation: The authors establish that the operations involved in the MLP scheme (affine transformations, compositions, and sums of functions) can be mapped to DNN operations. They prove that if the coefficients (drift, diffusion, terminal condition, and nonlinearity) can be approximated by DNNs without CoD, then the solution to the PDE can also be approximated by a DNN without CoD.
Parameter Counting: They rigorously bound the number of parameters ( $P(\Phi)$ ) and the depth of the resulting DNN, showing they grow polynomially in $d$ and $1/\epsilon$ .

3. Key Contributions

Extension to $L^p$ -Complexity Analysis:
The paper generalizes the complexity analysis of MLP algorithms from the $L^2$ -norm to the $L^p$ -norm for $p \in [2, \infty)$ . This is significant because $L^p$ -estimates provide stronger control over the tail behavior of the error distribution, which is crucial for risk management in finance and other applications.
DNN Representation with Diverse Activations:
While previous works focused primarily on ReLU, this paper proves that DNNs with Leaky ReLU and Softplus activations also overcome the curse of dimensionality for these PDEs. This broadens the practical applicability of the theoretical results to a wider range of neural network architectures used in practice.
Rigorous Complexity Bounds:
The authors prove that both the computational effort of the MLP algorithm and the number of parameters in the approximating DNNs grow at most polynomially in the dimension $d$ and the inverse accuracy $1/\epsilon$ . Specifically, the number of parameters is bounded by $C d^\eta \epsilon^{-(4+\delta)}$ (for some constants $C, \eta, \delta$ ).
Mathematical Framework for DNN Calculus:
The paper develops a robust mathematical framework (Setting 1.3 and Section 3) to handle the composition of DNNs with different activation functions, proving that identities and Lipschitz functions can be represented efficiently by these networks.

4. Main Results

Theorem 1.1 (MLP Convergence): Establishes that for semilinear parabolic PDEs with Lipschitz nonlinearities, the MLP approximation converges to the unique viscosity solution in the $L^p$ -sense. The computational cost is polynomial in $d$ and $1/\epsilon$ .
Theorem 1.4 (DNN Approximation): Proves that the solution to the PDE can be approximated by a DNN with ReLU, Leaky ReLU, or Softplus activation. The DNN's parameter count and depth grow polynomially in $d$ and $1/\epsilon$ , thereby overcoming the curse of dimensionality.
Numerical Example: The paper includes a numerical experiment (Example 1.2) with dimension $d=100$ and a sine nonlinearity. The results demonstrate a convergence rate consistent with the theoretical $L^4$ -error bounds, validating the theoretical findings.

5. Significance

Theoretical Breakthrough: This work bridges the gap between the empirical success of deep learning in high-dimensional PDEs and rigorous mathematical theory. It provides one of the few proofs that DNNs can theoretically avoid the curse of dimensionality for nonlinear PDEs in general $L^p$ norms.
Practical Relevance: By including Leaky ReLU and Softplus, the paper aligns theoretical guarantees with modern deep learning practices, where these activations are often preferred for stability and gradient flow.
Robustness: The extension to $L^p$ norms ( $p > 2$ ) offers a more robust error metric than $L^2$ , ensuring that the approximation is accurate even in the tails of the distribution, which is critical for applications like option pricing and risk assessment.
Future Directions: The authors note that the next challenge is the gradient-dependent case (where the nonlinearity $f$ depends on $\nabla u$ ), which requires more complex stochastic fixed-point equations (Bismut–Elworthy–Li formula) and different MLP structures.

In summary, this paper provides a rigorous mathematical foundation confirming that Multilevel Picard methods and Deep Neural Networks are viable, scalable tools for solving high-dimensional nonlinear PDEs, regardless of the specific activation function (ReLU, Leaky ReLU, Softplus) used, and for a broad class of error metrics ( $L^p$ ).

Multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation overcome the curse of dimensionality when approximating semilinear parabolic partial differential equations in LpL^pLp-sense