Scalable Contrastive Causal Discovery under Unknown Soft Interventions

Imagine you are a detective trying to figure out the family tree of a large, chaotic group of people. You want to know who influences whom (e.g., "Does the father's mood affect the son's homework, or does the son's homework affect the father's mood?").

In the world of data science, this is called Causal Discovery.

The Problem: The Foggy Room

Usually, you only have one type of evidence: Observational Data. This is like watching the family from a distance through a foggy window. You see them interacting, but you can't tell who started the conversation. Maybe the dad yelled, and the son cried. Or maybe the son cried, and the dad yelled. From just watching, you can't be sure who caused what. You end up with a list of possibilities, but no single truth.

To solve this, scientists usually try Interventions. This is like walking into the room and saying, "Dad, stop talking for a minute!" If the son stops crying, you know Dad was the cause.

But here's the catch in the real world:

Soft Interventions: You can't always force someone to stop talking. Maybe you just give Dad a cup of coffee, which makes him talk differently (softer, faster), but he doesn't stop. This is a "soft intervention."
Unknown Targets: You don't know who you are influencing. You just know the whole room's dynamic changed slightly.
One Snapshot: Often, you only get to see the "Before" (Observational) and the "After" (Interventional) once. You don't get to run the experiment a hundred times.

The Solution: SCONE (The Detective's New Toolkit)

The paper introduces SCONE (Scalable contrastive Causal discOv-ery under unknowN soft intervEntions). Think of SCONE as a super-smart detective who uses a special trick called Contrastive Learning.

Here is how it works, using a simple analogy:

1. The "Spot the Difference" Game

Imagine you have two photos of a messy room:

Photo A (Observational): The room is messy.
Photo B (Interventional): The room is messy, but the lighting is slightly different because someone turned on a lamp (the "soft intervention").

SCONE doesn't just look at Photo A or Photo B alone. It looks at both at the same time and asks: "What changed? What stayed the same?"

The "Invariant" (The Same): If a chair is in the corner in both photos, it's probably just a chair. It wasn't moved by the lamp. In data terms, this is a relationship that is stable across both regimes.
The "Contrast" (The Change): If a vase moved only in Photo B, the lamp (the intervention) must have affected the vase. In data terms, this is a change in how variables interact.

2. The "Local Clues" vs. The "Global Picture"

SCONE is also Scalable. Imagine trying to solve a mystery with 1,000 people. It's too hard to look at everyone at once.

The Strategy: SCONE breaks the big group into small, manageable teams (subsets). It solves the family tree for Team A, then Team B, then Team C.
The Magic Glue: Usually, if you solve small puzzles, you might miss the big picture. SCONE uses a special "Axial Attention" mechanism (think of it as a super-connector) to stitch all the small local solutions together into one giant, consistent global map. It ensures that if Team A says "Alice is Bob's boss," and Team B says "Bob is Charlie's boss," the whole map makes sense together.

3. The "Contrastive Orientation Rules" (The Logic)

This is the brain of SCONE. It uses the "Spot the Difference" clues to decide the direction of the arrows (who causes whom).

Rule 1: The One-Sided Shift. Imagine you see that "Coffee" changes the way "Dad" talks, but "Dad" doesn't change the way "Coffee" is poured. Since the change only happened on Dad's side, the arrow must point from Coffee to Dad.
Rule 2: The V-Shape. Imagine three people: Alice, Bob, and Charlie. If Alice and Charlie are both calm, but Bob gets crazy when the lamp turns on, SCONE realizes Bob is the "collider" (the meeting point). It figures out that Alice and Charlie are both influencing Bob, not the other way around.

Why is this a Big Deal?

It handles the "Unknowns": Previous methods needed to know exactly who was being intervened on. SCONE works even if you have no idea who the target is, as long as you see the change.
It's Fast: It can handle huge graphs (100+ variables) that would crash older, slower methods.
It's Robust: It works even if the "rules" of the world change slightly (e.g., the data comes from a different distribution), which is common in real life (like biology or economics).

The Bottom Line

SCONE is like a detective who, instead of needing a perfect crime scene, can look at two slightly different photos of a messy room, spot the subtle differences, and reconstruct the entire story of who did what to whom, even if they don't know exactly who started the trouble. It turns a confusing fog of data into a clear, directed map of cause and effect.

Here is a detailed technical summary of the paper "Scalable Contrastive Causal Discovery under Unknown Soft Interventions" (SCONE).

1. Problem Statement

Causal discovery aims to recover Directed Acyclic Graphs (DAGs) representing causal relationships. However, observational data alone only identifies the Markov Equivalence Class (MEC), leaving many edge orientations ambiguous. While interventions can resolve these ambiguities, real-world scenarios often present two major challenges:

Unknown Targets: The specific variables targeted by interventions are not known.
Soft Interventions: Interventions often alter the mechanism (distribution) of a variable without completely severing its connections (unlike "hard" interventions), and the graph structure remains invariant.
Scalability & Generalization: Existing methods for soft interventions (e.g., $\Psi$ -FCI) rely on global oracle access to conditional independencies and invariances, making them computationally intractable for large graphs and unable to generalize to out-of-distribution (OOD) structures.

The paper addresses the problem of learning causal structures from two regimes (observational and interventional) where the intervention targets are unknown and soft, using only subset-level information and limited cross-regime invariance queries.

2. Methodology: SCONE

The authors propose SCONE (Scalable contrastive Causal discOv-ery under unknowN soft intervEntions), a deep learning framework that combines classical causal discovery with contrastive learning and axial attention.

A. Theoretical Framework: Restricted $\Psi$ -Equivalence

The authors formalize a new theoretical setting called Restricted $\Psi$ -Equivalence.

Information Constraints: Unlike $\Psi$ $Ψ$ -FCI, SCONE does not have access to global conditional independencies. It only accesses:
1. Local Partially Directed Acyclic Graphs (PDAGs) estimated on small, admissible subsets of variables.
2. A finite set of tested cross-regime invariance queries (checking if $P(X_v|X_Z)$ is the same in both regimes).
Estimand: The goal is to recover the Test-Induced Restricted $\Psi$ Essential Graph ( $G_{test}$ ), which contains all edges and orientations that are compelled (identical) across all DAGs consistent with the limited subset-level and invariance data.

B. Contrastive Orientation Rules

SCONE introduces three theoretically proven contrastive orientation rules that leverage the difference between regimes to orient edges that remain ambiguous within a single regime:

Single-Sided Invariance (SSI): If an edge $i-j$ is undirected in both regimes, but node $j$ shows a distributional shift (change) while $i$ remains invariant given a conditioning set $Z$ , the edge is oriented $i \to j$ .
Contrastive V-Structure (CVT): If an unshielded triple $i-j-k$ is undirected, but $j$ changes while $i$ and $k$ remain invariant, the structure is compelled to be a collider $i \to j \leftarrow k$ .
Contrastive Discriminating Path (DPT): Extends logic to discriminating paths, using invariance of intermediate nodes to resolve the orientation of the target edge.

C. Model Architecture

SCONE is a neural architecture designed for scalability:

Two-Stream Processing:
- Marginal Stream: Processes edge tokens from sampled subsets using Axial Attention (along subset and edge axes).
- Global Stream: Maintains dense representations for all node pairs using a global precision matrix.
Reparameterization: Edge embeddings are decomposed into invariant (shared structure) and contrast (regime-specific shift) channels to sharpen the signal for orientation rules.
Bias Heads: Three learned modules (SSI, CVT, DPT) act as "bias heads" that inject signed biases into the edge logits based on the contrastive rules derived from the theoretical proofs.
Aggregation: A message-passing mechanism iteratively updates local subset representations with global context, ensuring consistency across the entire graph.

3. Key Contributions

Scalable Architecture: The first framework to perform causal discovery under unknown soft interventions using a scalable, differentiable architecture that generalizes to OOD graphs.
Theoretical Guarantees:
- Formalized Restricted $\Psi$ -Equivalence and the Test-Induced Essential Graph.
- Proved that the proposed contrastive rules are sound (they only orient edges compelled by the data) and complete within the restricted information setting.
- Demonstrated a strict separation: Non-contrastive subset aggregators cannot recover edges that SCONE can, proving the necessity of cross-regime contrast.
Asymptotic Consistency: Proved that as sample size and the number of sampled subsets increase, SCONE asymptotically recovers the true restricted essential graph.
Empirical Performance: Achieved state-of-the-art results in structural recovery (SHD) and edge orientation (F1) on synthetic benchmarks, significantly outperforming baselines like AVICI, SEA, and NOTEARS, especially in OOD and large-scale settings.

4. Experimental Results

The authors evaluated SCONE on synthetic datasets with varying graph sizes (20, 50, 100 nodes) and edge densities.

In-Distribution Performance: On 20-node graphs with polynomial mechanisms, SCONE achieved the lowest Structural Hamming Distance (SHD: 14.6) and highest F1 (0.655), outperforming NOTEARS and DCDI.
Out-of-Distribution (OOD) Generalization: When trained on Linear/NN mechanisms and tested on unseen Polynomial/Sigmoid mechanisms, SCONE maintained robust performance, whereas baselines like SEA and AVICI degraded significantly or failed (F1 $\approx$ 0).
Scalability: On 100-node graphs, SCONE was the only method to produce meaningful results (SHD: 126.7, F1: 0.237). Baselines like DCD-FG and SEA failed to scale, producing dense, incorrect graphs with SHD > 1000.
Ablation Studies:
- Removing Bias Heads (contrastive rules) increased SHD and decreased F1, confirming the rules drive orientation accuracy.
- Removing Contrastive Features (reparameterization) significantly degraded performance, proving that modeling the shift/invariance explicitly is necessary.

5. Significance

This work bridges the gap between theoretical causal discovery under soft interventions and practical, scalable machine learning.

Realism: It addresses the realistic constraint of unknown intervention targets and soft interventions, which are common in biology and economics but often ignored by "hard intervention" models.
Efficiency: By moving from global oracle queries to subset-level sampling and neural aggregation, it makes causal discovery feasible for large-scale systems (100+ nodes).
Generalization: The ability to learn causal mechanisms that generalize to unseen graph structures and functional forms makes SCONE a strong candidate for foundation models in causal inference.

In summary, SCONE provides a rigorous, scalable, and empirically validated solution for discovering causal structures when only limited, noisy, and partially observed interventional data is available.

Scalable Contrastive Causal Discovery under Unknown Soft Interventions

The Problem: The Foggy Room

The Solution: SCONE (The Detective's New Toolkit)

1. The "Spot the Difference" Game

2. The "Local Clues" vs. The "Global Picture"

3. The "Contrastive Orientation Rules" (The Logic)

Why is this a Big Deal?

The Bottom Line

1. Problem Statement

2. Methodology: SCONE

A. Theoretical Framework: Restricted Ψ\PsiΨ-Equivalence

B. Contrastive Orientation Rules

C. Model Architecture

3. Key Contributions

4. Experimental Results

5. Significance

More like this

BEFANA: A Tool for Biodiversity-Ecosystem Functioning Assessment by Network Analysis

Riemannian Laplace Approximation with the Fisher Metric

Fast Fishing: Approximating BAIT for Efficient and Scalable Deep Active Image Classification

Graph machine learning for flight delay prediction due to holding manouver

Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy

A. Theoretical Framework: Restricted $\Psi$ -Equivalence