Any Model, Any Place, Any Time: Get Remote Sensing Foundation Model Embeddings On Demand

To address the challenges of heterogeneity in remote sensing foundation models, this paper introduces rs-embed, a Python library that enables users to retrieve embeddings from any supported model for any location and time range through a unified, single-line interface.

Dingqi Ye, Daniel Kiv, Wei Hu, Jimeng Shi, Shaowen Wang

Published 2026-03-02
📖 5 min read🧠 Deep dive

Imagine you are a chef trying to cook a giant, global stew. You have a recipe that calls for "vegetables," but you have a problem: every farmer in the world sends you their vegetables in a different box, wrapped in different paper, and cut into different shapes. One sends you whole carrots, another sends you carrot juice, and a third sends you carrots that are already cooked but in a box that doesn't fit your pot.

To make your stew, you'd have to spend all your time unwrapping, chopping, and reshaping these vegetables just to get them into the pot. By the time you're ready to cook, you're exhausted, and you can't even taste-test the different recipes to see which one is best.

This is exactly the problem the paper "Any Model, Any Place, Any Time" is solving for the world of Remote Sensing.

Here is the breakdown in simple terms:

1. The Problem: The "Vegetable Chaos"

In the world of Earth observation, scientists use powerful AI models (called Foundation Models) to look at satellite images and understand what's happening on the ground (like predicting crop yields or tracking deforestation).

Currently, using these models is a nightmare because:

  • Different Boxes: Some models give you the raw code, others give you pre-cooked answers.
  • Different Shapes: One model needs a square image, another needs a rectangle. One needs 3 colors (Red, Green, Blue), another needs 12 different "super-colors."
  • Hard to Compare: If you want to see which model is better at predicting corn growth, you have to build a custom machine for each model to feed it the data. It's slow, expensive, and confusing.

2. The Solution: The "Universal Adapter" (rs-embed)

The authors built a tool called rs-embed. Think of this as a universal kitchen adapter or a smart translator.

Instead of you having to go to every farm, unwrap every box, and chop every vegetable yourself, you just tell the adapter:

"I want to see what the corn fields in Illinois looked like in July 2019, using the top 5 best AI models."

The adapter does the rest:

  • It fetches the data: It goes to the satellite archives (like Google Earth Engine) and grabs the right images.
  • It does the prep work: It cuts the images into the exact shape and size each specific AI model needs.
  • It runs the models: It feeds the data to all the different AIs at once.
  • It gives you a standard result: Instead of getting 5 different messy formats, you get 5 neat, standardized lists of numbers (called embeddings) that you can immediately compare.

3. How It Works (The Magic Behind the Curtain)

The paper describes a system with three main layers, which we can imagine as a highly efficient assembly line:

  • The Spec Layer (The Order): You write a simple "order" (a single line of code) saying Where (location), When (time), and Which Models you want.
  • The Provider Layer (The Delivery Truck): This part goes out to the satellite data warehouses, grabs the raw images, and cleans them up (removing clouds, fixing colors) so they are ready for the AIs.
  • The Embedder Layer (The Chefs): This is where the AI models live. The tool feeds the cleaned images to the models. Some models cook the image instantly; others just look up a pre-cooked answer from a database.
  • The Orchestra (The Conductor): This is the brain that manages the traffic. It makes sure the data flows smoothly, doesn't crash the computer, and handles errors (like if a satellite image is missing) without stopping the whole process.

4. Why Does This Matter? (The Taste Test)

The authors tested their tool by trying to predict corn yields in Illinois.

  • Before: A researcher would spend weeks setting up different systems to test different models.
  • With rs-embed: They ran the experiment with a single line of code.

They found that while some models were great at predicting average yields, they all struggled with "outliers" (fields that were incredibly good or incredibly bad). Because the tool made it so easy to compare them, they could quickly see why they failed and learn how to improve them.

They also visualized the "thoughts" of 16 different AI models. It was like looking at 16 different artists painting the same landscape. Some focused on the rivers, others on the roads, but they all captured the general shape of the land. This helps scientists understand what each model is actually "seeing."

The Bottom Line

rs-embed turns a chaotic, hours-long technical headache into a one-click experience.

It allows scientists to stop worrying about how to get the data and start focusing on what the data means. It's like turning a library where every book is written in a different language and stored in a different room, into a library where you just ask a librarian, "Show me the books about space," and they hand you a perfectly organized stack of translated, ready-to-read books.

The Goal: "Any Model, Any Place, Any Time." No matter which AI you want to use, no matter where on Earth you are looking, and no matter what time of year, the tool gets you the answer instantly.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →