Expert-Aided Causal Discovery of Ancestral Graphs

The Big Picture: Solving a Mystery with a Flawed Map and a Noisy Guide

Imagine you are a detective trying to solve a complex crime. You have a pile of evidence (data), but the evidence is messy. Some clues are missing, and some events happened because of a hidden third party you can't see (like a secret mastermind).

In the world of data science, this is called Causal Discovery. Scientists want to draw a map showing what caused what. But because of missing clues (hidden variables), the maps they draw are often wrong or incomplete.

This paper introduces a new tool called AGFN (Ancestral GFlowNet). Think of AGFN as a super-smart, diversity-seeking detective who doesn't just guess one answer, but explores many possible maps at once. Even better, this detective has a special ability: it can ask a human expert (or a smart AI) for help, but it knows the expert might be tired, confused, or slightly wrong. AGFN knows how to listen to that "noisy" advice and still find the truth.

1. The Problem: The "Hidden Mastermind"

Usually, when scientists try to figure out cause-and-effect, they assume they can see everything. But in real life, there are often hidden confounders.

The Analogy: Imagine you see that people who carry umbrellas get wet. You might think: "Carrying an umbrella causes getting wet!"
The Reality: There is a hidden variable: Rain. Rain causes people to carry umbrellas and causes them to get wet.
The Challenge: If you don't know about the rain, your map is wrong. In data science, these hidden variables create "Ancestral Graphs" (AGs). These are maps that are much more complex and have many more possibilities than standard maps. It's like trying to find a specific needle in a haystack that is 100 times bigger than usual.

2. The Solution: The "Diversity-Seeking Detective" (AGFN)

Old methods try to find just one best map. If they make a small mistake early on, the whole map is ruined.

AGFN is different. Instead of picking one map, it acts like a tour guide leading a group of explorers.

It sends out thousands of "explorers" (simulated scenarios) to try different map structures.
It doesn't just look for the "best" map immediately; it looks for a variety of maps that fit the data well.
The Magic: It learns to generate these maps in a way that matches the data perfectly, ensuring it doesn't waste time on impossible maps (like a map where the effect happens before the cause).

3. The Twist: Asking for Help (The "Noisy Expert")

Sometimes, the data isn't enough. You need to ask an expert: "Did A cause B, or was it just a coincidence?"

But experts aren't perfect.

The Problem: If you ask a human, they might be unsure. If you ask a Large Language Model (like an AI chatbot), it might give different answers to the same question depending on how you phrase it.
The Old Way: Previous tools assumed experts were gods who never made mistakes. If an expert said "A causes B," the tool blindly believed it.
The AGFN Way: AGFN treats the expert as fallible but helpful. It assumes the expert is "better than random" (they know more than a coin flip) but might be wrong sometimes.
- The Analogy: Imagine you are playing a game of "20 Questions." If your friend says, "Is it an animal?" and they are 80% sure, AGFN doesn't say "Yes, definitely!" It says, "Okay, there's an 80% chance it's an animal, so let's keep that in mind but stay open to other possibilities."

4. How It Works: The "Smart Question" Strategy

Asking an expert is expensive (it takes time or money, especially if using advanced AI). You don't want to ask about things you already know.

AGFN uses a strategy called Active Learning.

The Analogy: Imagine you are playing a guessing game. Instead of asking random questions, AGFN calculates: "Which question, if answered, will teach me the most and reduce my confusion the most?"
It picks the specific pair of variables where it is most confused and asks the expert about that relationship.
Once the expert answers, AGFN updates its internal "belief map," becoming more confident and narrowing down the possibilities.

5. The Results: Winning the Game

The authors tested this system on:

Fake Data: Where they knew the answer perfectly.
Real Data: Like gene networks (how genes affect each other) and protein interactions.

They compared AGFN against the "champions" of the current field (other top algorithms).

The Result: AGFN found the correct map much faster and more accurately than the others.
The "Few Shots" Magic: Even with very few questions (sometimes fewer than 4 answers from an expert), AGFN could fix its mistakes and find the true structure. It outperformed the experts who didn't ask for help at all.

Summary in a Nutshell

The Goal: Draw a map of cause-and-effect when some clues are hidden.
The Tool: A smart AI (AGFN) that explores many possible maps at once.
The Secret Sauce: It knows how to listen to human or AI experts even when they are unsure or contradictory.
The Strategy: It asks the right questions to get the most value for the least effort.

In short: AGFN is like a detective who knows how to read a messy crime scene, knows how to ask a witness for help without getting confused by their nervousness, and uses that information to solve the case faster than anyone else.

1. Problem Statement

Causal Discovery (CD) aims to infer causal structures from observational data. However, existing methods face three critical limitations:

Unreliability: Statistical errors (e.g., false independence due to low power) often lead to incorrect edge orientations, violating the faithfulness assumption.
Latent Confounding: Most CD methods assume "causal sufficiency" (no unobserved confounders). When latent confounders exist, the system is better represented by Ancestral Graphs (AGs) rather than Directed Acyclic Graphs (DAGs). The space of AGs is exponentially larger than DAGs, making search difficult.
Expert Integration: Existing methods that incorporate expert knowledge typically assume:
- Ex-ante constraints only: Knowledge is hard-coded before the algorithm runs.
- Perfect feedback: Experts provide deterministic, noiseless constraints.
- Point estimates: They output a single graph rather than a distribution.

The paper addresses the need for a probabilistic CD algorithm that handles latent confounding, integrates ex-ante structural constraints, and iteratively refines its search using noisy, ex-post expert feedback (e.g., from humans or Large Language Models).

2. Methodology: Ancestral GFlowNets (AGFN)

The authors propose Ancestral GFlowNets (AGFN), a diversity-seeking reinforcement learning algorithm designed to sample from the space of Ancestral Graphs.

A. Core Architecture: GFlowNet over AGs

AGFN treats causal discovery as a generative process over a State Graph (SG):

States: Partially specified graphs where edges between variable pairs are progressively defined.
Actions: Adding a relationship between a pair of variables: No edge ( $\emptyset$ ), Directed ( $\to$ or $\leftarrow$ ), or Bidirected ( $\leftrightarrow$ ).
Policy: A neural network (MLP or GNN) learns a stochastic policy $p_F$ to traverse the state graph.
Objective: The policy is trained to sample graphs proportional to a reward function $R(G)$ , typically based on goodness-of-fit scores (e.g., BIC) to the observational data.
Validity Masking: A critical component is the dynamic mask $m(G)$ . Since AGs have strict structural constraints (no directed cycles, no "almost directed" cycles), the algorithm uses an incremental update mechanism (Algorithm 1) to mask invalid transitions in real-time, ensuring only valid AGs are generated.

B. Integration of Expert Knowledge (EITL)

The framework introduces an Expert-in-the-Loop (EITL) pipeline for iterative refinement:

Bayesian Expert Model: The expert's feedback is modeled as a noisy observation of the true relationship. The model assumes the expert is "better-than-random" (accuracy $> 25\%$ for 4 options). It handles uncertainty by maintaining a posterior distribution over edge types.
Active Knowledge Elicitation: Instead of querying experts randomly, AGFN uses Bayesian Experimental Design. It selects the variable pair that minimizes the expected entropy of the posterior distribution, maximizing information gain per query.
Log-Pooling Update: Upon receiving feedback, the algorithm updates the policy using log-pooling. It combines the data-driven policy ( $p_F$ ) with the expert-induced posterior ( $q_r$ ) to create an Expert-Refined (ER) policy. This ensures the model prioritizes graphs that are both data-compatible and consistent with expert intuition.

C. Theoretical Guarantees

Consistency: The authors prove that if the expert provides "better-than-random" feedback, the mode of the AGFN distribution converges to the true Ancestral Graph as the number of feedback instances increases.
Robustness: This convergence holds even if the expert's confidence parameter is misspecified (as long as it remains better-than-random).

3. Key Contributions

First Probabilistic CD for Latent Confounding: Introduces the first algorithm capable of performing distributional inference over the space of Ancestral Graphs (AGs), rather than just point estimates.
Unified EITL Framework: Proposes the first pipeline that simultaneously supports:
- Ex-ante structural constraints (e.g., sparsity, partitionability) encoded into the generative process.
- Noisy ex-post expert feedback (human or LLM) for iterative refinement.
Optimal Query Strategy: Develops an active learning strategy to select the most informative variable pairs to query, minimizing the cost of expert interaction.
Theoretical Convergence: Provides rigorous proofs that the method converges to the true causal structure under realistic noise assumptions.

4. Experimental Results

The authors evaluated AGFN on synthetic and realistic datasets, comparing it against strong baselines (FCI, GFCI, ACI, DCD, N-ADMG).

Distributional Accuracy: AGFN accurately learned target distributions over AGs, including sparse 25-node graphs, matching expected edge probabilities and BIC scores.
Structural Accuracy (SHD):
- On synthetic datasets (chain, IV, collider motifs), AGFN achieved lower Structural Hamming Distance (SHD) to the ground truth than all baselines.
- Crucially, AGFN outperformed baselines even with very few expert queries (often fewer than 4).
Real-World Validation:
- DREAM3 (Gene Regulatory Networks): AGFN found higher-scoring graphs than baselines.
- Sachs Dataset (LLM Expert): Using GPT-4o as an expert, AGFN significantly outperformed traditional CD methods. The LLM-aided AGFN reduced SHD drastically compared to unrefined models and other algorithms.

5. Significance and Impact

Bridging the Gap: AGFN bridges the gap between purely data-driven CD and expert-driven reasoning, acknowledging that experts are fallible but valuable.
Handling Latency: By operating on AGs, it addresses the "latent confounding" problem that plagues standard DAG-based methods, offering a more realistic representation of complex systems (e.g., biology, economics).
Efficiency: The active learning component makes the process data-efficient, reducing the cost of querying expensive resources (like LLM APIs or human domain experts).
Scalability: The use of amortized sampling (GFlowNets) allows the method to scale to larger graphs (e.g., 25 nodes) where exact enumeration is impossible.

In summary, this paper presents a robust, theoretically grounded framework for causal discovery that leverages the strengths of probabilistic machine learning and human/LLM expertise to solve the challenging problem of inferring causal structures in the presence of hidden confounders.