Advancing Earth Observation Through Machine Learning: A TorchGeo Tutorial

This paper introduces a tutorial for the PyTorch-based library TorchGeo that demonstrates its core abstractions and guides users through an end-to-end workflow for training a semantic segmentation model on Sentinel-2 imagery to perform multispectral water segmentation.

Caleb Robinson, Nils Lehmann, Adam J. Stewart, Burak Ekim, Heng Fang, Isaac A. Corley, Mauricio Cordeiro

Published 2026-03-04
📖 5 min read🧠 Deep dive

Imagine you are a chef trying to cook a gourmet meal using ingredients from a massive, global warehouse.

The Problem: The "Standard Kitchen" vs. The "Satellite Warehouse"
Usually, when people teach computers to "see" (like recognizing cats in photos), they use standard recipes. They take a neat, square photo, chop it into tiny, uniform pieces, and feed it to the computer.

But Earth Observation (looking at Earth from space) is different. It's like trying to cook with ingredients from a warehouse that is:

  1. Huge: The "photos" are entire continents, not just a single picture. They are too big to fit on your kitchen counter (computer memory).
  2. Messy: The ingredients come in different units. One pile is in meters, another in feet. One is a photo, another is a map drawn with lines.
  3. Tricky: If you just grab a random handful of ingredients, you might accidentally grab the same spot twice (which ruins the learning) or miss the whole picture.

The Solution: TorchGeo (The "Smart Kitchen Assistant")
The authors of this paper introduced a tool called TorchGeo. Think of it as a super-smart kitchen assistant designed specifically for this messy satellite warehouse. Instead of forcing you to manually measure, re-draw, and chop every single ingredient before you start cooking, TorchGeo handles the heavy lifting.

Here is how the paper explains their new "Cooking Class" (Tutorial) using simple analogies:

1. The Magic Mixing Bowl (Composable Datasets)

In a normal kitchen, if you have a bowl of flour and a bowl of sugar, you mix them. In the satellite world, you might have a photo of a forest and a separate map of where the trees are.

  • The Analogy: TorchGeo has magic operators (like & and |) that act like a smart mixing bowl.
    • The Union (|) says, "Mix everything together!" (Creating a giant mosaic of all available photos).
    • The Intersection (&) says, "Only keep the parts where both the photo and the map exist."
  • Why it matters: You don't have to manually cut out the matching pieces. The assistant does it instantly, only grabbing the specific slice you need right now, saving you from trying to carry the whole warehouse into your kitchen.

2. The GPS Cutter (Spatiotemporal Indexing)

Usually, you pick a photo by its filename. With satellites, you pick a photo by its location (latitude/longitude) and time.

  • The Analogy: Imagine a giant, infinite pizza. Instead of asking for "the top-left slice," you tell the assistant, "Give me a 256-inch square slice starting at these exact GPS coordinates."
  • Why it matters: The assistant instantly cuts that exact square out of the massive pizza, ensuring the "top" of your slice matches the "top" of the map, even if the map was drawn in a different language (coordinate system).

3. The Smart Tasting Spoon (Geographic Samplers)

When training a computer, you need to show it many examples.

  • The Analogy:
    • Random Sampling (Training): Imagine a blindfolded chef throwing darts at the pizza to pick random spots to taste. This helps the chef learn the general flavor of the whole pizza.
    • Grid Sampling (Testing): Imagine the chef carefully cutting the pizza into a perfect grid to taste every single inch to make sure the whole thing is cooked.
  • Why it matters: TorchGeo handles this "dart throwing" and "grid cutting" automatically, ensuring the computer learns without cheating (like seeing the same spot twice) and tests thoroughly.

4. The Real-World Test: The "Rio Water Hunt"

The second half of the paper is a live demonstration. They built a model to find water in satellite photos of Rio de Janeiro, Brazil.

  • The Challenge: Satellite photos have many "colors" (bands) that human eyes can't see, like infrared. The model needed to be taught how to handle these extra colors.
  • The Fix: They taught the model to look at the "special ingredients" (like calculating how wet a pixel looks using math formulas called spectral indices) and added them to the mix.
  • The Result: They trained the model on a computer, then sent it to look at a real, massive photo of Rio.
  • The Output: Instead of just saying "I got 80% right," the model produced a new map (GeoTIFF) showing exactly where the water is in Rio, pixel-by-pixel. You can zoom in and see if it correctly identified the water in the rivers and the ocean.

The Big Takeaway

This paper isn't just about code; it's about removing the friction.

Before, if you wanted to use AI to study Earth, you had to spend 90% of your time fixing messy data and 10% actually building the AI. TorchGeo flips that ratio. It lets scientists and developers spend 90% of their time solving real problems (like tracking water, forests, or cities) and only 10% worrying about the messy data.

It turns the "hard mode" of satellite data processing into a smooth, standard workflow, making it easier for anyone to use AI to protect and understand our planet.