Stochastic gradient descent based variational inference for infinite-dimensional inverse problems

This paper proposes and theoretically validates two stochastic gradient descent-based variational inference methods for infinite-dimensional inverse problems, utilizing constant-rate iterations with randomization to efficiently sample from posterior distributions and demonstrating their effectiveness through preconditioning and numerical applications to linear and non-linear problems.

Jiaming Sui, Junxiong Jia, Jinglai Li

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine you are a detective trying to solve a mystery, but you can't see the culprit directly. You only have a few blurry photos (the data) and a hunch about what the culprit might look like (the prior knowledge). Your goal is to reconstruct the culprit's face as accurately as possible. In the world of math and science, this is called an inverse problem.

The paper you're asking about is about a new, super-fast way for computers to solve these mysteries, especially when the "culprit" is something incredibly complex, like the temperature distribution inside a star or the flow of water through a sponge (which are infinite-dimensional problems).

Here is the breakdown of their solution, using simple analogies:

1. The Old Way: The Slow, Careful Hiker (MCMC)

Traditionally, to solve these mysteries, scientists used a method called Markov Chain Monte Carlo (MCMC).

  • The Analogy: Imagine you are trying to find the highest peak in a massive, foggy mountain range. The old method is like a hiker who takes one tiny, careful step at a time. They check every direction, make sure they aren't falling off a cliff, and slowly wander around until they have mapped the whole mountain.
  • The Problem: This is incredibly accurate, but it takes forever. If the mountain is huge (infinite dimensions), the hiker might spend their whole life just walking in circles. It's too slow for modern, large-scale problems.

2. The New Idea: The Drunken Skier (Stochastic Gradient Descent)

The authors propose a new method based on Stochastic Gradient Descent (SGD).

  • The Analogy: Instead of a careful hiker, imagine a skier who is slightly drunk. They know the general direction of the peak (the gradient), but they also have a bit of a wobble (random noise). They slide down the hill quickly, zig-zagging a bit.
  • The Magic: In machine learning, this "drunken" sliding is usually used just to find the best spot (the peak). But the authors realized something brilliant: If you keep the skier moving at a constant speed with just the right amount of wobble, they don't just find the peak; they end up exploring the entire mountain range in a way that perfectly matches the probability of where the peak actually is.
  • The Result: Instead of taking 100 years to map the mountain, the skier does it in a few hours.

3. The Two Versions: The Basic Skier vs. The Preconditioned Skier

The paper introduces two versions of this "drunken skier" technique:

  • cSGD-iVI (The Basic Skier): This is the standard version. The skier has a fixed amount of wobble. It's fast and works well for simple mountains.
  • pcSGD-iVI (The Preconditioned Skier): This is the upgraded version. Imagine the skier is now wearing special skis (a "preconditioner") that are tuned to the shape of the mountain.
    • If the mountain has a steep cliff on one side and a gentle slope on the other, normal skis make you spin out. These special skis adjust your balance automatically.
    • The Result: The preconditioned skier doesn't just go fast; they go smart. They explore the mountain much more efficiently and give a much more accurate map of the "culprit's" location.

4. Why "Infinite-Dimensional" Matters

Most computer problems are like a grid of 100 pixels. You can count them. But real-world physics (like fluid flow or heat) happens in a continuous space—you can zoom in forever, and there's always more detail. This is "infinite-dimensional."

  • The Challenge: You can't count the pixels if there are infinite of them.
  • The Solution: The authors proved mathematically that their "drunken skier" method works even when the mountain is infinitely detailed. They showed that the skier's path converges to the true answer, provided you tune the "wobble" (the noise) and the "speed" (the learning rate) correctly.

5. The "Secret Sauce": Tuning the Wobble

The most important part of the paper is how they figure out exactly how much the skier should wobble.

  • They treat the problem like a game of "Hot or Cold." They want the skier's path to match the true probability of where the culprit is.
  • They use a mathematical tool called KL Divergence (think of it as a "distance meter" between two maps) to find the perfect speed and wobble amount.
  • The Outcome: They derived a formula that tells the computer exactly how to set these knobs so the "drunken" path becomes a perfect statistical map of the truth.

Summary of Results

The authors tested this on two real-world scenarios:

  1. A simple smooth equation: Like finding a hidden shape in a smooth fog.
  2. Darcy Flow: Like figuring out how water moves through a complex underground rock layer (very messy and non-linear).

The Verdict:

  • The Basic Skier (cSGD) was much faster than the old "Careful Hiker" (MCMC) but sometimes missed the edges of the mountain.
  • The Preconditioned Skier (pcSGD) was the winner. It was fast, accurate, and its map of the mountain was almost identical to the slow, perfect map made by the old method, but it did it in a fraction of the time.

In a nutshell: This paper teaches computers how to solve incredibly complex, infinite puzzles by turning a slow, careful search into a fast, slightly chaotic slide that magically lands on the right answer every time.