Multimodal Modular Chain of Thoughts in Energy Performance Certificate Assessment

This paper introduces Multimodal Modular Chain of Thoughts (MMCoT), a cost-efficient framework utilizing Vision-Language models to improve automated Energy Performance Certificate (EPC) pre-assessment by decomposing the estimation into structured reasoning stages, which demonstrated statistically significant accuracy gains over standard prompting on a UK residential dataset.

Zhen Peng, Peter J. Bentley

Published 2026-03-03
📖 4 min read☕ Coffee break read

Imagine you want to know how energy-efficient your house is, but you don't have a certified expert with a clipboard, and you certainly can't afford to pay them hundreds of dollars to come visit. In many parts of the world, getting this "Energy Performance Certificate" (EPC) is a luxury or simply impossible because there isn't enough data or money to do it properly.

This paper introduces a clever new way to solve that problem using Artificial Intelligence. Think of it as a "Digital Detective" that can look at a few photos of a house and guess its energy rating with surprising accuracy.

Here is the breakdown of how it works, using some everyday analogies:

1. The Problem: The "Black Box" Mistake

Usually, when we ask an AI to guess something complex (like an energy rating), we just show it a picture and say, "What's the rating?"

  • The Flaw: This is like asking a student to solve a complex math problem without showing their work. The AI guesses the final answer based on a gut feeling. If the house looks old, it might guess "Low Energy," but it might miss that the old house has brand-new solar panels. It treats every clue separately and often gets confused.

2. The Solution: The "Assembly Line" Detective (MMCoT)

The authors created a system called MMCoT (Multimodal Modular Chain of Thoughts). Instead of asking the AI for the final answer immediately, they force it to act like a detective on an assembly line, solving the mystery step-by-step.

Imagine a team of specialists passing a case file down a line:

  • Station 1 (The Architect): The AI looks at the outside of the house and guesses the age. (e.g., "This looks like a 1920s building.")
  • Station 2 (The Window Inspector): The AI looks at the windows, but now it knows the house is from the 1920s. It uses that context to guess if the windows are single or double-glazed.
  • Station 3 (The Heating Expert): The AI looks at the heater. It knows the age and the windows, so it can make a smarter guess about the heating system.
  • Station 4 (The Lighting Scout): It checks the lightbulbs, keeping all the previous clues in mind.
  • Station 5 (The Judge): Finally, the AI takes all those previous guesses and the photos to make the final Energy Rating.

The Magic Trick: The system doesn't just guess; it passes notes. The answer from Station 1 becomes a "hint" for Station 2. This is called Chain Propagation. It's like a relay race where the baton (the information) is passed smoothly from runner to runner, ensuring the final result is based on a complete story, not just a snapshot.

3. The "Photo Cheat Sheet" (Multimodal Few-Shot)

Sometimes, the AI gets stuck. For example, it might struggle to tell the difference between a 1950s house and a 1960s house just by looking.

  • The Fix: The system can show the AI a "cheat sheet" of reference photos. It's like showing a student a picture of a "1950s house" and saying, "See? Look for these specific brick patterns." This helps the AI anchor its guess before moving to the next step.

4. Why This Matters

The researchers tested this on 81 real houses in the UK.

  • The Result: The "Assembly Line" detective (MMCoT) was much better than the "Gut Feeling" detective (standard AI).
  • The Error: When the AI did get it wrong, it usually only missed by one step (e.g., guessing a "C" rating when it was actually a "D"). It rarely made huge mistakes (like guessing "A" when it was "G").
  • The Cost: Doing this costs pennies per house. A human expert costs £60–£120. This makes it possible to check thousands of homes for the price of one.

The Bottom Line

This paper isn't trying to replace the official government inspectors. Instead, it offers a low-cost, early-warning system.

Think of it like a health checkup app on your phone. It can't replace a full medical exam by a doctor, but if you take a photo of a rash and answer a few questions, it can tell you, "Hey, that looks serious, you should see a doctor," or "That looks fine, don't worry."

For buildings, this tool can tell a landlord, "Your house looks like it has poor energy efficiency; you should probably get a real inspection and maybe fix those windows." It brings the power of energy assessment to places where it was previously too expensive or too data-poor to exist.