HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images

This paper introduces HeatPrompt, a zero-shot vision-language framework that leverages pretrained Large Vision Language Models to extract thermal-relevant features from satellite images and generate accurate heat-demand estimates, significantly outperforming baseline models in data-scarce urban environments.

Kundan Thota, Xuanhao Mu, Thorsten Schlachter, Veit Hagenmeyer

Published 2026-02-24
📖 4 min read☕ Coffee break read

Imagine you are a city planner trying to figure out how much heating a neighborhood needs to stay warm in winter. Usually, to do this accurately, you need a massive spreadsheet for every single building: "Built in 1950, brick walls, thin windows, old boiler."

But here's the problem: That spreadsheet doesn't exist. Most cities don't have that level of detail, and privacy laws often stop them from collecting it.

This is where a new tool called HeatPrompt comes in. Think of it as a "Super-Intelligent Satellite Detective" that can look at a photo of a neighborhood and guess the heating needs without ever needing a spreadsheet.

Here is how it works, broken down into simple concepts:

1. The Old Way vs. The New Way

  • The Old Way (The "Blind Architect"): Imagine trying to guess how much fuel a house needs by only knowing its address and size. You might guess, but you'd be wrong a lot because you don't know if the roof is leaking or if the windows are single-pane glass. This is what current computer models do; they rely on incomplete data.
  • The New Way (HeatPrompt): Instead of guessing, HeatPrompt uses a satellite photo and asks a super-smart AI (called a Vision-Language Model, or VLM) to act like a seasoned energy expert.

2. The "Magic Prompt"

The researchers didn't just feed the photo to the AI. They gave it a specific job description, like a boss giving instructions to a new employee.

They told the AI: "Look at this satellite image. Pretend you are an expert city energy planner. Describe what you see that would make a house hot or cold. Look for things like: Is the roof old and rusty? Is it covered in green plants? Are there many houses packed tightly together?"

The AI then writes a short "caption" or a list of observations, just like a human would.

  • Example Output: "I see an old, red-tiled roof with no insulation. There are many houses packed close together, but very few trees to block the wind."

3. Turning Words into Numbers

Once the AI writes these descriptions, HeatPrompt turns those words into a mathematical code (a vector). It's like translating a poem into a secret number that a calculator can understand.

Then, a simple calculator (a mathematical model) takes these "word-numbers" and combines them with basic map data (like "how big is the area?") to predict the total heat demand.

4. Why is this a Big Deal?

The paper shows that this method is a game-changer for two main reasons:

  • It's Much More Accurate: When they tested it, the new method was 93% better at predicting heat needs than the old "blind" methods. It reduced the error rate by 30%. It's like going from a weather forecast that's right half the time to one that's right almost every time.
  • It's Transparent (No "Black Box"): Usually, AI models are like black boxes: you put data in, and a number comes out, but you don't know why. HeatPrompt is different. Because it uses the AI's written descriptions, a human planner can look at the result and say, "Ah, the AI predicted high heat demand because it saw old roofs and no trees. That makes sense!"

The "Detective" Analogy

Imagine you are trying to guess how much money a person spends on groceries.

  • Method A (Old): You only know their zip code. You guess "Average."
  • Method B (HeatPrompt): You look at a photo of their house. You see a big garden (maybe they grow their own food?), a fancy car (maybe they eat out?), and a large family (more mouths to feed). You use those visual clues to make a much smarter guess.

The Bottom Line

HeatPrompt is a tool that lets cities use satellite photos to "read" the energy needs of buildings, even when they don't have the official data files. It helps cities plan how to switch from fossil fuels to cleaner energy by knowing exactly where the heat is needed most, without needing to knock on every door to ask for permission.

It turns a blurry satellite image into a clear, actionable plan for a warmer, greener future.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →