A Benchmark Dataset for Machine Learning Surrogates of Pore-Scale CO2-Water Interaction

This paper introduces a comprehensive benchmark dataset comprising 624 high-resolution 2D samples from numerical simulations, designed to facilitate the development and evaluation of machine learning surrogates for modeling pore-scale CO2-water interactions in carbon capture and storage applications.

Alhasan Abdellatif, Hannah P. Menke, Julien Maes, Ahmed H. Elsheikh, Florian Doster

Published 2026-03-03
📖 4 min read☕ Coffee break read

Imagine you are trying to predict how a drop of ink spreads through a complex, sponge-like piece of bread. Now, replace the ink with CO2 gas, the bread with underground rock, and the water in the bread with brine (salty water).

This paper is about creating a massive, high-definition "training manual" for computers so they can learn to predict exactly how that CO2 will move underground without having to run expensive, slow physics simulations every single time.

Here is the breakdown of the paper using simple analogies:

1. The Problem: The "Slow Motion" Dilemma

Scientists know that storing CO2 underground (to stop climate change) is tricky. The rock isn't a smooth pipe; it's a messy maze of tiny holes (pores) and grains.

  • The Old Way: To understand how CO2 moves, scientists used to run super-computer simulations. It's like trying to predict the weather by building a tiny, perfect model of the atmosphere in a lab. It's accurate, but it takes forever and costs a fortune.
  • The New Way: They want to use Machine Learning (AI). Think of AI as a student who learns by looking at thousands of examples. Once the student learns the rules, they can guess the weather in seconds.
  • The Catch: The student needs a really good textbook. Previous textbooks were too small, too simple, or only showed the "final exam" (the end result) rather than the "study process" (how the gas moves second-by-second).

2. The Solution: The "Super-Textbook" Dataset

The authors created a brand new dataset (a collection of data) to train these AI students.

  • The Size: It contains 624 different "rock samples."
  • The Detail: Each sample is a high-resolution image (512x512 pixels), zoomed in so much that one pixel is smaller than a human hair (35 microns).
  • The Movie: Instead of just a still photo, they captured 100 frames of a movie for each sample. This shows the CO2 gas pushing the water out of the rock over time, step-by-step.
  • The Variety: They didn't just make one type of rock. They made rocks with different levels of "messiness" (heterogeneity).
    • Level 1: A neat, organized rock (like a well-sorted sand beach).
    • Level 5: A chaotic, jumbled rock (like a pile of mixed gravel and pebbles).
    • Why this matters: If you only train an AI on neat rocks, it will fail when it sees a messy real-world rock. This dataset forces the AI to learn how to handle any kind of geological mess.

3. How They Made It: The "Virtual Sandbox"

They didn't drill 624 real rocks (that would be impossible). Instead, they used a computer program to simulate the physics.

  • They built a virtual world where they could control the size of the grains and how they were spaced.
  • They injected virtual CO2 into these virtual rocks and watched how it displaced the water.
  • They recorded everything: the speed of the gas, the pressure, and exactly where the water went at every single moment.

4. The Test: Does the Student Pass?

To prove this dataset works, they trained three different AI models:

  • Student A: Trained on the messy, diverse dataset (all 5 levels of rock types).
  • Student B: Trained on a medium dataset (4 levels).
  • Student C: Trained on only the simplest, neatest rocks (1 level).

The Result: When they tested the students on a new, messy rock type they had never seen before:

  • Student C struggled. They were too used to neat rocks and got confused by the chaos.
  • Student A (the one trained on the diverse dataset) did the best job. They learned the general rules of how fluids move, regardless of how messy the rock was.

5. Why This Matters for the Real World

This dataset is like a gym for AI.

  • Before this, AI models for geology were like athletes who only practiced on a smooth treadmill.
  • Now, thanks to this paper, AI models can practice on a "rocky obstacle course."
  • The Goal: In the future, engineers can use these trained AI models to instantly predict if a CO2 storage site is safe and efficient, without waiting days for a supercomputer to finish the math. This speeds up the transition to green energy and helps us store carbon safely underground.

In a nutshell: The authors built a massive, high-definition library of "virtual rock movies" to teach AI how to predict CO2 movement in the real, messy underground world, making carbon capture faster and smarter.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →