Expert-Aided Causal Discovery of Ancestral Graphs

This paper introduces Ancestral GFlowNet (AGFN), a diversity-seeking reinforcement learning algorithm that enables distributional inference over ancestral graphs by iteratively refining its policy through Bayesian aggregation of both ex-ante and uncertain ex-post expert feedback, ultimately converging to the true causal structure even when expert responses are noisy or conflicting.

Tiago da Silva, Bruna Bazaluk, Eliezer de Souza da Silva, António Góis, Salem Lahlou, Dominik Heider, Samuel Kaski, Diego Mesquita, Adèle Helena Ribeiro

Published 2026-03-09
📖 5 min read🧠 Deep dive

The Big Picture: Solving a Mystery with a Flawed Map and a Noisy Guide

Imagine you are a detective trying to solve a complex crime. You have a pile of evidence (data), but the evidence is messy. Some clues are missing, and some events happened because of a hidden third party you can't see (like a secret mastermind).

In the world of data science, this is called Causal Discovery. Scientists want to draw a map showing what caused what. But because of missing clues (hidden variables), the maps they draw are often wrong or incomplete.

This paper introduces a new tool called AGFN (Ancestral GFlowNet). Think of AGFN as a super-smart, diversity-seeking detective who doesn't just guess one answer, but explores many possible maps at once. Even better, this detective has a special ability: it can ask a human expert (or a smart AI) for help, but it knows the expert might be tired, confused, or slightly wrong. AGFN knows how to listen to that "noisy" advice and still find the truth.


1. The Problem: The "Hidden Mastermind"

Usually, when scientists try to figure out cause-and-effect, they assume they can see everything. But in real life, there are often hidden confounders.

  • The Analogy: Imagine you see that people who carry umbrellas get wet. You might think: "Carrying an umbrella causes getting wet!"
  • The Reality: There is a hidden variable: Rain. Rain causes people to carry umbrellas and causes them to get wet.
  • The Challenge: If you don't know about the rain, your map is wrong. In data science, these hidden variables create "Ancestral Graphs" (AGs). These are maps that are much more complex and have many more possibilities than standard maps. It's like trying to find a specific needle in a haystack that is 100 times bigger than usual.

2. The Solution: The "Diversity-Seeking Detective" (AGFN)

Old methods try to find just one best map. If they make a small mistake early on, the whole map is ruined.

AGFN is different. Instead of picking one map, it acts like a tour guide leading a group of explorers.

  • It sends out thousands of "explorers" (simulated scenarios) to try different map structures.
  • It doesn't just look for the "best" map immediately; it looks for a variety of maps that fit the data well.
  • The Magic: It learns to generate these maps in a way that matches the data perfectly, ensuring it doesn't waste time on impossible maps (like a map where the effect happens before the cause).

3. The Twist: Asking for Help (The "Noisy Expert")

Sometimes, the data isn't enough. You need to ask an expert: "Did A cause B, or was it just a coincidence?"

But experts aren't perfect.

  • The Problem: If you ask a human, they might be unsure. If you ask a Large Language Model (like an AI chatbot), it might give different answers to the same question depending on how you phrase it.
  • The Old Way: Previous tools assumed experts were gods who never made mistakes. If an expert said "A causes B," the tool blindly believed it.
  • The AGFN Way: AGFN treats the expert as fallible but helpful. It assumes the expert is "better than random" (they know more than a coin flip) but might be wrong sometimes.
    • The Analogy: Imagine you are playing a game of "20 Questions." If your friend says, "Is it an animal?" and they are 80% sure, AGFN doesn't say "Yes, definitely!" It says, "Okay, there's an 80% chance it's an animal, so let's keep that in mind but stay open to other possibilities."

4. How It Works: The "Smart Question" Strategy

Asking an expert is expensive (it takes time or money, especially if using advanced AI). You don't want to ask about things you already know.

AGFN uses a strategy called Active Learning.

  • The Analogy: Imagine you are playing a guessing game. Instead of asking random questions, AGFN calculates: "Which question, if answered, will teach me the most and reduce my confusion the most?"
  • It picks the specific pair of variables where it is most confused and asks the expert about that relationship.
  • Once the expert answers, AGFN updates its internal "belief map," becoming more confident and narrowing down the possibilities.

5. The Results: Winning the Game

The authors tested this system on:

  1. Fake Data: Where they knew the answer perfectly.
  2. Real Data: Like gene networks (how genes affect each other) and protein interactions.

They compared AGFN against the "champions" of the current field (other top algorithms).

  • The Result: AGFN found the correct map much faster and more accurately than the others.
  • The "Few Shots" Magic: Even with very few questions (sometimes fewer than 4 answers from an expert), AGFN could fix its mistakes and find the true structure. It outperformed the experts who didn't ask for help at all.

Summary in a Nutshell

  • The Goal: Draw a map of cause-and-effect when some clues are hidden.
  • The Tool: A smart AI (AGFN) that explores many possible maps at once.
  • The Secret Sauce: It knows how to listen to human or AI experts even when they are unsure or contradictory.
  • The Strategy: It asks the right questions to get the most value for the least effort.

In short: AGFN is like a detective who knows how to read a messy crime scene, knows how to ask a witness for help without getting confused by their nervousness, and uses that information to solve the case faster than anyone else.