A Benchmark Dataset for Machine Learning Surrogates of Pore-Scale CO2-Water Interaction

Imagine you are trying to predict how a drop of ink spreads through a complex, sponge-like piece of bread. Now, replace the ink with CO2 gas, the bread with underground rock, and the water in the bread with brine (salty water).

This paper is about creating a massive, high-definition "training manual" for computers so they can learn to predict exactly how that CO2 will move underground without having to run expensive, slow physics simulations every single time.

Here is the breakdown of the paper using simple analogies:

1. The Problem: The "Slow Motion" Dilemma

Scientists know that storing CO2 underground (to stop climate change) is tricky. The rock isn't a smooth pipe; it's a messy maze of tiny holes (pores) and grains.

The Old Way: To understand how CO2 moves, scientists used to run super-computer simulations. It's like trying to predict the weather by building a tiny, perfect model of the atmosphere in a lab. It's accurate, but it takes forever and costs a fortune.
The New Way: They want to use Machine Learning (AI). Think of AI as a student who learns by looking at thousands of examples. Once the student learns the rules, they can guess the weather in seconds.
The Catch: The student needs a really good textbook. Previous textbooks were too small, too simple, or only showed the "final exam" (the end result) rather than the "study process" (how the gas moves second-by-second).

2. The Solution: The "Super-Textbook" Dataset

The authors created a brand new dataset (a collection of data) to train these AI students.

The Size: It contains 624 different "rock samples."
The Detail: Each sample is a high-resolution image (512x512 pixels), zoomed in so much that one pixel is smaller than a human hair (35 microns).
The Movie: Instead of just a still photo, they captured 100 frames of a movie for each sample. This shows the CO2 gas pushing the water out of the rock over time, step-by-step.
The Variety: They didn't just make one type of rock. They made rocks with different levels of "messiness" (heterogeneity).
- Level 1: A neat, organized rock (like a well-sorted sand beach).
- Level 5: A chaotic, jumbled rock (like a pile of mixed gravel and pebbles).
- Why this matters: If you only train an AI on neat rocks, it will fail when it sees a messy real-world rock. This dataset forces the AI to learn how to handle any kind of geological mess.

3. How They Made It: The "Virtual Sandbox"

They didn't drill 624 real rocks (that would be impossible). Instead, they used a computer program to simulate the physics.

They built a virtual world where they could control the size of the grains and how they were spaced.
They injected virtual CO2 into these virtual rocks and watched how it displaced the water.
They recorded everything: the speed of the gas, the pressure, and exactly where the water went at every single moment.

4. The Test: Does the Student Pass?

To prove this dataset works, they trained three different AI models:

Student A: Trained on the messy, diverse dataset (all 5 levels of rock types).
Student B: Trained on a medium dataset (4 levels).
Student C: Trained on only the simplest, neatest rocks (1 level).

The Result: When they tested the students on a new, messy rock type they had never seen before:

Student C struggled. They were too used to neat rocks and got confused by the chaos.
Student A (the one trained on the diverse dataset) did the best job. They learned the general rules of how fluids move, regardless of how messy the rock was.

5. Why This Matters for the Real World

This dataset is like a gym for AI.

Before this, AI models for geology were like athletes who only practiced on a smooth treadmill.
Now, thanks to this paper, AI models can practice on a "rocky obstacle course."
The Goal: In the future, engineers can use these trained AI models to instantly predict if a CO2 storage site is safe and efficient, without waiting days for a supercomputer to finish the math. This speeds up the transition to green energy and helps us store carbon safely underground.

In a nutshell: The authors built a massive, high-definition library of "virtual rock movies" to teach AI how to predict CO2 movement in the real, messy underground world, making carbon capture faster and smarter.

1. Problem Statement

Accurately modeling the interaction between CO2 and water in porous media at the pore scale is critical for carbon capture and storage (CCS), enhanced oil recovery, and groundwater management. However, current approaches face significant limitations:

Computational Cost: High-fidelity numerical simulations (e.g., Direct Numerical Simulation, Lattice Boltzmann) are computationally expensive, making them impractical for large-scale or real-time applications.
Data Scarcity for ML: While Machine Learning (ML) models offer efficient surrogates, they lack sufficient, diverse training data. Existing datasets are often limited to:
- Small spatial scales: Maximum mesh sizes of $256 \times 256$ , which fail to capture fine-grained patterns in complex formations.
- Static outputs: Most datasets only predict the final state after injection, ignoring the dynamic temporal evolution of the process.
- Low heterogeneity: Limited representation of geological variability (grain size distribution, packing irregularities).

2. Methodology

The authors developed a comprehensive benchmark dataset and a validation framework using the following approach:

A. Dataset Generation (Geometry & Heterogeneity)

Geometry Generation: Pore structures were generated using an open-source Python notebook (DrawMicromodels). The method perturbs a regular triangular lattice to introduce heterogeneity.
Heterogeneity Levels: Five distinct levels of heterogeneity were defined by varying grain radius and positional jitter (simulating sedimentary sorting and compaction).
- Level 1: Well-sorted media (low deviation).
- Level 5: Highly heterogeneous media (high deviation).
Parametric Sweep: The authors swept over mean grain radii ( $R_0 \in \{70, 80, 90\}$ ) and target porosities ( $\phi \in \{0.20, \dots, 0.45\}$ ).
Augmentation: From 78 accepted base images, non-overlapping quadrants were cropped and vertically mirrored to create a final ensemble of 624 unique 2D samples.
Specifications: Each sample is $512 \times 512$ pixels with a spatial resolution of 35 µm.

B. Numerical Simulation (Physics)

Solver: Simulations were performed using GeoChemFoam, an OpenFOAM-based solver developed at Heriot-Watt University.
Physics Model: The solver uses the algebraic Volume-of-Fluid (VoF) method to solve the single-field Navier-Stokes equations coupled with a phase transport equation.
- Forces: Includes viscous stress, surface tension (modeled via Continuous Surface Force), and capillary forces.
- Boundary Conditions: CO2 is injected from the left boundary into a fully water-saturated domain at a constant rate ( $1 \times 10^{-8} m^3/s$ ).
Simulation Parameters:
- Duration: 1 second total, with 100 equally spaced time steps (0.01 s interval).
- Fluid Properties: CO2 and water densities/viscosities were set to realistic values; interfacial tension = 0.03 N/m; contact angle = $45^\circ$ .
- Convergence: Tolerance of $1 \times 10^{-8}$ .

C. Data Structure

The dataset is organized in HDF5 format, containing:

Time-series fields (100 steps): Water saturation ( $\alpha_{water}$ ), pressure ( $p$ ), capillary pressure ( $p_c$ ), and velocity components ( $U_x, U_y$ ).
Static fields: Binary physical domain image (pores=1, grains=0).
Metadata: CSV files containing porosity, permeability, and relative permeability data.

3. Key Contributions

High-Resolution Spatiotemporal Dataset: A unique collection of 624 samples with $512 \times 512$ resolution and 100 time steps, significantly larger and more detailed than previous benchmarks ( $256 \times 256$ ).
Controlled Heterogeneity: The dataset explicitly covers a spectrum of geological variability (Levels 1–5), allowing researchers to test model robustness against different pore structures.
Dynamic Process Capture: Unlike static datasets, this resource captures the entire transient evolution of CO2 displacement, enabling the training of models for time-series prediction.
Open Access & Reproducibility: The dataset is publicly available via Dryad, and the simulation code (GeoChemFoam) and generation scripts are open-source on GitHub.

4. Results (Technical Validation)

The authors validated the dataset's utility by training three U-Net architectures to predict future CO2 saturation based on a sequence of four previous maps (autoregressive prediction up to 60 steps).

Experimental Setup:
- Model A (5-Levels): Trained on the full diverse dataset.
- Model B (4-Levels): Trained on a subset excluding Level 5.
- Model C (1-Level): Trained only on the simplest (Level 1) data.
- Test Set: All models were evaluated on unseen samples from Level 5.
Performance Metrics (Mean Squared Error - MSE):
- 5-Levels Model: Lowest MSE (0.0145), serving as the upper-bound benchmark.
- 4-Levels Model: MSE of 0.0254.
- 1-Level Model: Highest MSE (0.0320).
Key Findings:
- Training on diverse data significantly improves generalization to unseen, complex geological scenarios.
- While diversity improves average performance, it does not guarantee improvement on every single sample (some specific out-of-distribution cases showed slight variance).
- Visual comparisons (Figures 7 & 8) confirmed that models trained on diverse data better captured complex displacement patterns (channeling, ganglion migration) compared to models trained on homogeneous data.

5. Significance

This work addresses a critical bottleneck in the application of AI to geoscience. By providing a high-fidelity, diverse, and temporally rich dataset, it enables the development of robust ML surrogates that can:

Replace computationally expensive pore-scale simulations for rapid scenario screening.
Accurately predict transient CO2 behavior in heterogeneous reservoirs, which is essential for optimizing CCS strategies.
Serve as a standardized benchmark for comparing future ML architectures in multiphase flow modeling.

The dataset bridges the gap between theoretical pore-scale physics and practical, scalable machine learning applications in energy transition and carbon storage.