Multi-Level Causal Embeddings

Imagine you are trying to understand a massive, complex city. You have two different maps:

Map A is a hyper-detailed street view of just the downtown district. It shows every single coffee shop, the exact number of pedestrians, and the specific traffic lights.
Map B is a hyper-detailed street view of just the industrial zone. It shows every factory, the specific types of trucks, and the power grid details.

Now, imagine you want to create one master map of the entire city that combines the insights from both. But here's the catch: Map A calls the downtown area "The City Center," while Map B breaks it down into "North District," "South District," and "East District." They are talking about the same thing, but with different levels of detail and different names.

This is the problem the paper solves.

The Old Way: "Abstraction" (The One-to-One Translation)

Previously, scientists used a method called Causal Abstraction. Think of this like a translator who can only translate a whole book from English to French. If you have a detailed English book (the low-level model) and a simple French summary (the high-level model), abstraction tells you how the whole book maps to the summary.

The Limitation: Abstraction assumes you have one detailed model and one simple model. It doesn't help if you have two different detailed models (like Map A and Map B) that you want to merge into a single, bigger picture.

The New Way: "Causal Embeddings" (The Puzzle Piece Fitting)

The authors introduce Causal Embeddings. Think of this as a set of special puzzle pieces.

Instead of translating a whole book, an embedding allows you to take a specific part of a detailed model (like the "Deer" section of Map A) and fit it perfectly into a specific slot of a larger, coarser model (the "Wildlife" section of the Master Map).

The Magic: It doesn't matter if the detailed model has 50 sub-species of deer and the master map just has one "Deer" category. The embedding acts as a bridge, saying, "Okay, all 50 of these detailed deer variables sum up to fit into this one 'Deer' slot in the big model."

How It Works in Real Life (The Ecosystem Example)

The paper uses a nature example to explain this:

Researcher 1 studied a forest and made a detailed model of Humans, Squirrels, and Deer.
Researcher 2 studied the same forest but made a detailed model of Wolves, Eagles, Red Deer, and Fallow Deer.
The Goal: We want to know how Predators (Wolves/Eagles) affect Humans. But neither researcher studied that specific link directly!
- Researcher 1 didn't study predators.
- Researcher 2 didn't study humans.

Using Embeddings:

We take Researcher 1's data and "embed" it into a big, high-level model. We tell the system: "The 'Deer' in your data is the same as the 'Deer' in our big model."
We take Researcher 2's data and "embed" it into the same big model. We tell the system: "The 'Wolves' and 'Eagles' in your data are both just 'Predators' in our big model."
The Result: Suddenly, the big model has data on both Humans and Predators! Even though no single researcher measured them together, the embedding allowed us to merge the datasets.

Why Is This Useful? (The "Missing Data" Fix)

The paper shows two main superpowers of this method:

Merging Different Detail Levels: Sometimes one dataset counts "Red Deer" and "Fallow Deer" separately, while another just counts "Deer." Embeddings act like a smart calculator that knows how to combine those numbers so they fit together without breaking the logic.
Solving the "Missing Link" Problem: In the example above, we wanted to know the link between Predators and Humans. Neither dataset had it. But by merging them into a common "language" (the high-level model), we could use math to impute (guess/fill in) the missing connection. It's like realizing that if Wolves eat Deer, and Humans hunt Deer, then Wolves and Humans are indirectly connected through the Deer.

The "Causal" Part (Why It's Not Just Statistics)

Most data merging is just about numbers (statistics). But this paper is about Causality (cause and effect).

Statistics might say: "When ice cream sales go up, shark attacks go up." (They are correlated, but one doesn't cause the other).
Causality asks: "If we stop selling ice cream, do shark attacks go down?" (No, because the real cause is the summer heat).

The authors ensure that when they merge these models, the cause-and-effect relationships remain true. If "Wolves cause Deer to decrease" in the detailed model, the merged model must still respect that rule, even if it simplifies the variables.

Summary

Think of Causal Embeddings as a universal adapter plug.

Old Method: You needed a specific plug for every single device.
New Method: You have a smart adapter that can take a complex, detailed device (like a high-end camera with 50 buttons) and plug it into a simple, coarse socket (like a basic phone camera app) without losing the ability to take a photo.

This allows scientists to take many small, specialized studies and stitch them together into one giant, powerful understanding of the world, even if the studies were done with different levels of detail or different definitions.

1. Problem Statement

The paper addresses the scalability and integration challenges inherent in Structural Causal Models (SCMs).

Scalability: Real-world causal models often become too large and complex for practical reasoning (e.g., modeling every subspecies in an ecosystem). While causal abstraction exists to map a detailed low-level model to a coarser high-level model, it typically assumes a one-to-one (surjective) mapping where the low-level model covers the entire high-level system.
Integration Gap: In many scientific scenarios, researchers possess multiple detailed low-level models, each describing only a sub-system of a larger global system. Current abstraction frameworks cannot easily merge these disjoint sub-models into a single coherent high-level model because they lack a mechanism to map partial, detailed models into specific sub-parts of a coarse model.
The Marginal Problem: The paper also targets the Causal Marginal Problem, which involves finding a joint causal model from overlapping marginal datasets. Existing solutions often fail when the overlapping variables in different datasets have different resolutions (e.g., one dataset tracks "Red Deer" and "Fallow Deer" separately, while another tracks only "Total Deer").

2. Methodology

The authors propose a framework based on Causal Embeddings, which generalizes the concept of causal abstraction.

A. Core Concept: Causal Embeddings

Unlike abstractions (which map a whole low-level model to a whole high-level model), embeddings map a detailed low-level model into a sub-system of a coarser high-level model.

Non-Surjective Mapping: The key mathematical innovation is relaxing the surjectivity requirement of the variable mapping ( $\phi$ ). In abstractions, every high-level variable must be covered by the low-level model. In embeddings, the mapping $\phi: R \to S$ is surjective only between the relevant subsets of variables, allowing the low-level model to describe only a part ( $S$ ) of the high-level model ( $M'$ ).
SCM Projections: The authors introduce SCM Projections to handle the removal of variables from the observable set ( $V$ ) to the unobservable set ( $U$ ), ensuring that causal dependencies are preserved when variables are "hidden" or aggregated.

B. Formal Definitions

$\alpha$ -Embedding: Defined as a non-surjective $\alpha$ -abstraction where the graphical structure of the high-level model (restricted to the embedded variables) is a Cluster DAG (CDAG) of the projected low-level graph.
Graphical Consistency: An embedding is consistent if mediated adjacencies (paths) and mediated confounders in the high-level model correspond correctly to paths and confounders in the low-level model.
Functional Consistency: The authors define an $L_i$ -Embedding Error (generalizing the abstraction error). An embedding is consistent if the distribution obtained by "embedding then evaluating" matches the distribution obtained by "evaluating then embedding" within a defined error bound (zero error implies perfect consistency).

C. The Multi-Resolution Causal Marginal Problem

The paper formalizes a new problem setting:

Input: Multiple SCMs ( $M_1, \dots, M_n$ ) with overlapping variables that may have different resolutions (e.g., different granularities or variable counts).
Goal: Find a joint SCM ( $M^*$ ) over a unified set of high-level variables ( $V^*$ ) that is consistent with all input models via the defined embeddings.
Solution: A set of consistent embeddings ( $\alpha_i: M_i \to M'$ ) constitutes a solution to this problem.

3. Key Contributions

Generalization of Abstraction: The paper introduces Causal Embeddings as a formal generalization of causal abstractions, allowing for the mapping of sub-systems rather than just whole systems.
Graphical and Functional Consistency: It establishes rigorous definitions for consistency in embeddings, proving that:
- $\alpha$ -embeddings are inherently graphically $L_2$ -consistent.
- Graphical consistency implies the existence of a functionally consistent embedding (under certain conditions).
Multi-Resolution Marginal Problem: It defines and solves the Multi-Resolution Causal Marginal Problem, extending the standard marginal problem to handle datasets with differing variable resolutions and granularities.
Dataset Merging Algorithm: The authors propose a practical algorithm (Algorithm 1) to merge overlapping datasets from different resolutions. This involves:
- Mapping data from marginal models to a shared high-level representation using embeddings.
- Handling structured missing data (caused by variables not present in certain sub-models) via imputation.

4. Results and Illustrations

The paper validates the framework through theoretical proofs and simulated examples:

Ecosystem Example: The authors demonstrate how two separate models (one modeling human hunting/berry availability, the other modeling predator interactions) can be embedded into a single high-level ecosystem model. The embeddings successfully map "Red Deer" + "Fallow Deer" to a single "Deer" node and "Wolves" + "Eagles" to "Predators."
Statistical Power Improvement: In a simulation merging datasets of 2,000 and 4,000 samples from different resolutions, the merged dataset reduced the KL Divergence (error) between the estimated and true distribution from 0.34/0.77 (individual) to **0.22** (merged), demonstrating improved statistical power.
Unifying Unobservable Variables: The framework allows estimating joint distributions (e.g., Predators vs. Humans) that are undefined in any single marginal model by aggregating data and imputing missing values.

5. Significance

Theoretical Advancement: It bridges the gap between causal abstraction (top-down simplification) and causal integration (bottom-up synthesis), providing a mathematical foundation for "multi-resolution" causal reasoning.
Practical Utility: The framework offers a solution for Data Fusion in complex systems (e.g., climate science, epidemiology, economics) where data comes from diverse sources with varying levels of detail. It allows researchers to combine partial, high-fidelity models into a coherent global picture without discarding the detailed information.
Handling Heterogeneity: By addressing the "multi-resolution" aspect, the work solves a critical bottleneck in causal inference where standard marginal problem solvers fail due to mismatched variable definitions across datasets.

In summary, this paper provides a robust mathematical framework for embedding detailed sub-systems into coarse global models, enabling the consistent merging of heterogeneous causal data and solving complex marginal problems that were previously intractable.