Are Bayesian networks typically faithful?

This paper establishes that faithful Bayesian networks constitute a dense, open, and typical set across various parametric and nonparametric classes, thereby providing a rigorous theoretical foundation for the consistency of constraint-based causal discovery algorithms like PC and FCI.

Philip Boeken, Patrick Forré, Joris M. Mooij

Published Fri, 13 Ma
📖 5 min read🧠 Deep dive

Imagine you are a detective trying to solve a mystery. You have a map of a city (the graph) showing how different neighborhoods (variables) are connected by roads (causal relationships). You also have a dataset of traffic patterns (the data) showing how cars move between these neighborhoods.

Your goal is to look at the traffic data and figure out what the map actually looks like. This is the job of Causal Discovery.

However, there's a tricky problem. Sometimes, the traffic patterns might look like two neighborhoods are unconnected even though the map says there is a road between them. This happens if:

  1. Cancellation: Two roads lead to the same place, but one brings cars in and the other takes them out at the exact same rate, making it look like no traffic is moving.
  2. Determinism: A traffic light is stuck on red, so no cars move regardless of the road layout.
  3. Coincidence: The numbers just happen to cancel out perfectly by chance.

In the world of statistics, we call a map and a dataset that perfectly match each other Faithful. If they don't match (because of the weird cancellations or coincidences above), they are Unfaithful.

Most causal discovery algorithms (like the famous PC or FCI algorithms) rely on a big assumption: "We assume the data is Faithful." They assume that if the map says two places are connected, the data will show a connection, and if the map says they are disconnected, the data will show no connection.

But is this a safe assumption? Is it likely that we will accidentally pick a "cursed" dataset where everything cancels out perfectly? Or is it safe to assume that "normal" data will behave well?

This paper answers that question with a resounding "Yes, it is safe."

Here is the breakdown of their findings using simple analogies:

1. The "Typicality" Question

The authors ask: If we randomly pick a map and a set of traffic rules, how likely is it that we get a "Faithful" pair?

They prove that Faithful pairs are the rule, not the exception.

  • The Analogy: Imagine a giant bag of marbles. Most marbles are blue (Faithful). A tiny, tiny speck of dust represents the red marbles (Unfaithful). If you reach in and grab a marble at random, you are almost guaranteed to get a blue one.
  • The Math: They show that the "Unfaithful" cases are so rare that they are "nowhere dense." In topology (the math of shapes and spaces), this means you can't find a whole cluster of them. They are like isolated specks of dust in a clean room.

2. The Different "Rooms" (Model Classes)

The authors didn't just look at one type of data. They checked several different "rooms" where data lives:

  • The "Wild West" (Nonparametric Models): This is the most general room, where data can be anything (continuous, discrete, mixed). They proved that even here, if you look at the data using a strict ruler (Total Variation metric), the Faithful cases are everywhere. You can't get lost in a sea of Unfaithful data.
  • The "Structured Room" (Exponential Families): This covers common statistical models like Linear Gaussian (straight lines) or Discrete networks (like flipping coins). They proved that if you pick parameters randomly (like rolling dice to set the rules), you will almost certainly get a Faithful result. The "bad" parameters are so rare they have zero probability of being picked.
  • The "Smooth Room" (Bounded Densities): This covers data that doesn't have sudden, jagged jumps. They showed that even here, Faithful models are the dominant, "typical" choice.

3. The "Hidden Variables" Twist

In real life, we often can't see everything. Maybe there's a hidden variable (like "Weather") affecting both "Ice Cream Sales" and "Sunburns," but we don't have a sensor for the weather.

  • The authors extended their proof to show that even with these hidden variables, the Faithful assumption still holds true for the variables we can see. The "bad" cases are still just specks of dust.

4. Why This Matters for AI and Science

Why should you care? Because this paper gives confidence to the algorithms scientists use every day.

  • The Guarantee: It tells us that algorithms like PC and FCI aren't just guessing. They are mathematically guaranteed to work correctly on almost all possible scenarios.
  • The "Typical" Domain: If you run a causal discovery algorithm on real-world data, the paper assures you that you are operating in a "safe zone." The chance that your data is one of those weird, perfectly cancelling-out anomalies is effectively zero.

The Big Picture Metaphor

Think of the space of all possible causal models as a vast, endless ocean.

  • Faithful models are the clear, calm water where you can see the bottom clearly.
  • Unfaithful models are tiny, invisible whirlpools that only exist in specific, rare coordinates.

The paper proves that if you drop a boat (your algorithm) anywhere in this ocean, it will almost certainly be floating on clear water. You don't need to worry about falling into a whirlpool unless you are specifically trying to find one.

In short: The assumption that "data reflects the true structure" is not just a convenient guess; it is a mathematical certainty for almost every situation we encounter. We can trust our causal discovery tools.