Functional Bias and Tangent-Space Geometry in Variational Inference

This paper establishes a geometric framework demonstrating that the leading-order bias of posterior functionals in variational inference is determined by the component of the functional orthogonal to the variational tangent space, thereby explaining why structured mean-field approximations systematically distort cross-block dependencies due to omitted interaction directions.

Sean Plummer

Published Wed, 11 Ma
📖 5 min read🧠 Deep dive

Here is an explanation of the paper "Functional Bias and Tangent-Space Geometry in Variational Inference" using simple language, analogies, and metaphors.

The Big Picture: The "Good Enough" Map

Imagine you are trying to navigate a complex, mountainous terrain (the True Reality or the Posterior Distribution). You need to know specific details: How high is the peak? How steep is the slope? Is there a hidden valley?

However, the terrain is too complex to map perfectly. It's too much data to process. So, you decide to use a simplified map (the Variational Approximation). This map is drawn on a flat piece of paper or a simple grid. It's easy to read and fast to use, but it can't capture every twist and turn of the real mountains.

The Problem: Because your map is simplified, it will be wrong about certain things. The paper asks: Which things will the map get right, and which things will it get wrong, and why?

The Core Idea: The "Shape" of Your Map

The author, Sean Plummer, uses a geometric idea called Tangent Space to explain this.

Think of your simplified map as a specific shape.

  • If your map is a flat sheet of paper, it can only represent flat things perfectly.
  • If your map is a grid of separate squares (this is called "Mean-Field"), it can only represent things that happen independently in each square. It cannot represent things where one square affects another.

The paper introduces a rule: The "Bias" (the error) depends on whether the thing you are measuring fits inside the shape of your map.

The Two Types of Errors

  1. The "Fits Perfectly" Group (Second-Order Bias):
    If the thing you want to measure is something your map shape can naturally describe, the error is tiny. It's like measuring the width of a square on a square grid; the map gets it almost exactly right.

    • Example: If you want to know the average height of the mountains in just the North block, and your map treats the North block independently, you get a great answer.
  2. The "Doesn't Fit" Group (First-Order Bias):
    If the thing you want to measure involves connections between different parts of the map, your simplified shape fails completely. The error is large and systematic.

    • Example: If you want to know how the weather in the North block affects the weather in the South block, a map that treats them as separate squares will tell you they have no relationship at all. It will say, "They are independent," even if they are actually storming together. This is a huge, predictable mistake.

The "Mean-Field" Analogy: The Silo Effect

The paper focuses heavily on a popular method called Structured Mean-Field.

Imagine a company with different departments (Marketing, Engineering, Sales).

  • The Real World: These departments talk to each other constantly. Marketing changes affect Engineering, which affects Sales.
  • The Mean-Field Map: This method forces the company to act like a set of Silos. It assumes Marketing doesn't know what Engineering is doing, and Engineering doesn't know about Sales.

The Result:

  • If you ask, "How much money does Marketing make?" (a single block), the Silo map gives a good answer.
  • If you ask, "How does a change in Marketing affect Sales?" (a connection between blocks), the Silo map gives a terrible answer. It assumes the connection is zero, even if it's huge.

The paper proves mathematically that this "Silos" approach will always underestimate the connections between different parts of the system.

The "Tangent Space" Metaphor: The Dance Floor

Imagine the "True Reality" is a complex dance floor where everyone is moving in a giant, intricate pattern.

  • The Variational Family is a group of dancers who are only allowed to move in specific, simple ways (e.g., only moving forward/backward or only left/right, but never diagonally together).
  • The Tangent Space is the set of all the moves this group can do.
  • The Bias is what happens when the real dance requires a move the group can't do (like a diagonal spin).

The paper says:

  • If the real dance move is within the group's allowed moves (the Tangent Space), the group can mimic it perfectly.
  • If the real dance move is outside their allowed moves (the Orthogonal Complement), the group will fail to capture it, and the error will be huge.

What This Means for Real Life

Why should you care?

  1. Don't trust the "Connections": If you use these simplified AI models to predict how different variables interact (like "How does interest rate affect stock prices?"), be very careful. The model is likely to tell you there is no relationship, even if there is one.
  2. Trust the "Averages": If you just want to know the average value of a single variable (like "What is the average temperature?"), these models are usually very accurate.
  3. Better Maps exist: The paper suggests that if you want to capture those tricky "connections," you need to change the shape of your map (the Variational Family) so it includes those connections in its "Tangent Space."

Summary in One Sentence

This paper explains that simplified AI models are great at predicting individual parts of a system, but they systematically fail to predict how those parts influence each other, because the "shape" of the model simply doesn't have room to hold those connections.