Are Bayesian networks typically faithful?

Imagine you are a detective trying to solve a mystery. You have a map of a city (the graph) showing how different neighborhoods (variables) are connected by roads (causal relationships). You also have a dataset of traffic patterns (the data) showing how cars move between these neighborhoods.

Your goal is to look at the traffic data and figure out what the map actually looks like. This is the job of Causal Discovery.

However, there's a tricky problem. Sometimes, the traffic patterns might look like two neighborhoods are unconnected even though the map says there is a road between them. This happens if:

Cancellation: Two roads lead to the same place, but one brings cars in and the other takes them out at the exact same rate, making it look like no traffic is moving.
Determinism: A traffic light is stuck on red, so no cars move regardless of the road layout.
Coincidence: The numbers just happen to cancel out perfectly by chance.

In the world of statistics, we call a map and a dataset that perfectly match each other Faithful. If they don't match (because of the weird cancellations or coincidences above), they are Unfaithful.

Most causal discovery algorithms (like the famous PC or FCI algorithms) rely on a big assumption: "We assume the data is Faithful." They assume that if the map says two places are connected, the data will show a connection, and if the map says they are disconnected, the data will show no connection.

But is this a safe assumption? Is it likely that we will accidentally pick a "cursed" dataset where everything cancels out perfectly? Or is it safe to assume that "normal" data will behave well?

This paper answers that question with a resounding "Yes, it is safe."

Here is the breakdown of their findings using simple analogies:

1. The "Typicality" Question

The authors ask: If we randomly pick a map and a set of traffic rules, how likely is it that we get a "Faithful" pair?

They prove that Faithful pairs are the rule, not the exception.

The Analogy: Imagine a giant bag of marbles. Most marbles are blue (Faithful). A tiny, tiny speck of dust represents the red marbles (Unfaithful). If you reach in and grab a marble at random, you are almost guaranteed to get a blue one.
The Math: They show that the "Unfaithful" cases are so rare that they are "nowhere dense." In topology (the math of shapes and spaces), this means you can't find a whole cluster of them. They are like isolated specks of dust in a clean room.

2. The Different "Rooms" (Model Classes)

The authors didn't just look at one type of data. They checked several different "rooms" where data lives:

The "Wild West" (Nonparametric Models): This is the most general room, where data can be anything (continuous, discrete, mixed). They proved that even here, if you look at the data using a strict ruler (Total Variation metric), the Faithful cases are everywhere. You can't get lost in a sea of Unfaithful data.
The "Structured Room" (Exponential Families): This covers common statistical models like Linear Gaussian (straight lines) or Discrete networks (like flipping coins). They proved that if you pick parameters randomly (like rolling dice to set the rules), you will almost certainly get a Faithful result. The "bad" parameters are so rare they have zero probability of being picked.
The "Smooth Room" (Bounded Densities): This covers data that doesn't have sudden, jagged jumps. They showed that even here, Faithful models are the dominant, "typical" choice.

3. The "Hidden Variables" Twist

In real life, we often can't see everything. Maybe there's a hidden variable (like "Weather") affecting both "Ice Cream Sales" and "Sunburns," but we don't have a sensor for the weather.

The authors extended their proof to show that even with these hidden variables, the Faithful assumption still holds true for the variables we can see. The "bad" cases are still just specks of dust.

4. Why This Matters for AI and Science

Why should you care? Because this paper gives confidence to the algorithms scientists use every day.

The Guarantee: It tells us that algorithms like PC and FCI aren't just guessing. They are mathematically guaranteed to work correctly on almost all possible scenarios.
The "Typical" Domain: If you run a causal discovery algorithm on real-world data, the paper assures you that you are operating in a "safe zone." The chance that your data is one of those weird, perfectly cancelling-out anomalies is effectively zero.

The Big Picture Metaphor

Think of the space of all possible causal models as a vast, endless ocean.

Faithful models are the clear, calm water where you can see the bottom clearly.
Unfaithful models are tiny, invisible whirlpools that only exist in specific, rare coordinates.

The paper proves that if you drop a boat (your algorithm) anywhere in this ocean, it will almost certainly be floating on clear water. You don't need to worry about falling into a whirlpool unless you are specifically trying to find one.

In short: The assumption that "data reflects the true structure" is not just a convenient guess; it is a mathematical certainty for almost every situation we encounter. We can trust our causal discovery tools.

Here is a detailed technical summary of the paper "Are Bayesian networks typically faithful?" by Boeken, Forr´e, and Mooij.

1. Problem Statement

In causal inference, constraint-based algorithms (e.g., PC, FCI) infer causal graphs from observational data by testing for conditional independencies. These algorithms rely on the Faithfulness Assumption, which posits that all conditional independencies in the data distribution $P$ are exactly those implied by the graph structure $G$ via $d$ -separation.

The Issue: Faithfulness can be violated by "accidental" cancellations (e.g., cancelling paths), deterministic variables, or deterministic relations.
The Question: Are these violations rare? While it is known that faithful parameters are "typical" (measure-zero unfaithful sets) for Linear Gaussian and Discrete Bayesian networks, it has been an open question whether this holds for broader parametric or nonparametric classes of Bayesian networks.
The Challenge: In infinite-dimensional nonparametric spaces, there is no canonical analogue to the Lebesgue measure, making standard measure-theoretic definitions of "typicality" difficult to apply.

2. Methodology

The authors address this by shifting from a purely measure-theoretic perspective to a topological perspective, while also extending measure-theoretic results to specific parametric families.

Topological Notion of Typicality: A set is considered "typical" if it is open and dense (or its complement is nowhere dense). This implies that unfaithful distributions are "atypical" in the sense that they do not form a cluster; any unfaithful distribution can be approximated arbitrarily closely by a faithful one.
Metrics and Topologies:
- Total Variation (TV) Metric ( $d_{TV}$ ): Used to show that conditional independence is a closed property (limits of independent sequences remain independent). This is crucial for proving faithfulness is an open set.
- Weak Topology: Closely related to statistical testability. However, conditional independence is not generally closed in the weak topology. The authors identify specific regularity conditions where the weak and TV topologies coincide.
- New Metric ( $d^\circ_{TV}$ ): Introduced for the space of Bayesian networks (tuples of Markov kernels). It measures the worst-case TV distance between kernels uniformly over conditioning variables. This is essential for treating Bayesian networks as causal mechanisms rather than just observational distributions.
Interpolation Technique: The core proof strategy involves constructing a continuous path (interpolation) between an unfaithful distribution and a faithful one. By showing that conditional dependence is preserved along this path for a small interval, they prove that faithful distributions are dense.

3. Key Contributions and Results

The paper establishes the typicality of faithfulness across three distinct classes of Bayesian networks:

A. Unconstrained Nonparametric Bayesian Networks

Result: For any DAG $G$ , the set of faithful observational distributions is open and dense in the space of all Markov distributions equipped with the Total Variation metric.
Implication: Unfaithful distributions are nowhere dense (atypical).
Extension: The authors introduce the metric $d^\circ_{TV}$ on the space of Bayesian networks (Markov kernels) and prove that faithful networks are also open and dense in this space. This distinguishes networks that differ on measure-zero sets of parent values, which is critical for causal modeling.

B. Conditional Exponential Family Parametrisations

Context: This class includes Linear Gaussian and Discrete networks but is more general.
Result: Under regularity conditions (analytic natural parameters, regular exponential family structure):
1. If at least one faithful parameter exists, the set of faithful parameters is open and dense in the Euclidean parameter space.
2. The set of unfaithful parameters has Lebesgue measure zero.
3. The induced set of faithful observational distributions is open and dense in the Weak Topology (which coincides with TV topology here).
Significance: This generalizes the classic theorems of Spirtes et al. (1993) and Meek (1995) to a much broader class of models.

C. Nonparametric Models with Uniform Regularity

Context: Models with uniformly equicontinuous and uniformly bounded conditional densities.
Result:
1. Faithful Bayesian networks are open and dense with respect to $d^\circ_{TV}$ .
2. Faithful observational distributions are open and dense in the Weak Topology.
Significance: This bridges the gap between parametric and fully nonparametric settings, showing that typicality holds even without a fixed parametric form, provided the densities are sufficiently regular.

D. Latent Variables

Result: The authors extend these findings to Bayesian networks with latent variables. They show that faithfulness with respect to the latent projection (an Acyclic Directed Mixed Graph, or ADMG) is also open and dense.
Mechanism: They prove that any distribution unfaithful to the latent projection is also unfaithful to the full DAG, allowing the density arguments to carry over.

4. Implications for Causal Discovery

The paper connects these topological properties to the consistency of causal discovery algorithms:

Consistent Testing: The authors show that for the considered classes, conditional independence is consistently testable. This is because the null hypothesis (independence) is closed in the relevant topology.
Algorithm Consistency: Since faithful networks form an open and dense set, and conditional independence is testable, any sound constraint-based algorithm (like PC or FCI) is consistent on an open and dense set of Bayesian networks.
Strong Faithfulness: The paper notes that while faithfulness is typical, strong faithfulness (requiring dependencies to be above a certain strength) is not necessarily typical in the measure-theoretic sense, though it is required for uniformly consistent testing.

5. Significance

Theoretical Foundation: The paper provides a rigorous mathematical justification for the widespread use of the faithfulness assumption in causal discovery, moving beyond specific parametric cases to general nonparametric settings.
Topological vs. Measure-Theoretic: It clarifies the relationship between "typicality" defined by measure (Lebesgue zero) and topology (nowhere dense). It demonstrates that even in spaces without a canonical measure, faithfulness is "typical" in a robust topological sense.
Causal Mechanism View: By defining faithfulness on the space of Markov kernels (using $d^\circ_{TV}$ ) rather than just observational distributions, the results align better with the causal interpretation of Bayesian networks as mechanisms that can be intervened upon.
Robustness: The results hold for mixed data types, latent variables, and various regularity conditions, suggesting that the "accidental" cancellations required to break faithfulness are structurally fragile and unlikely to occur in practice.

In summary, the paper proves that faithful Bayesian networks are the rule, not the exception, across a vast landscape of statistical models, thereby validating the theoretical underpinnings of modern constraint-based causal discovery methods.