Characterizing MARL for Energy Control: A Multi-KPI Benchmark on the CityLearn Environment

This paper establishes a comprehensive multi-KPI benchmark for Multi-Agent Reinforcement Learning in urban energy management using the CityLearn environment, demonstrating that Decentralized Training with Decentralized Execution (DTDE) consistently outperforms Centralized Training with Decentralized Execution (CTDE) in both average and worst-case performance while offering greater resilience and sustainability.

Aymen Khouja, Imen Jendoubi, Oumayma Mahjoub, Oussama Mahfoudhi, Ruan De Kock, Siddarth Singh, Claude Formanek

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine a bustling city neighborhood where every house has its own solar panels, a battery for storing power, and a thermostat that controls heating and cooling. Now, imagine trying to get all these houses to work together perfectly without a central boss telling them exactly what to do every second. That's the challenge of Smart City Energy Management.

This paper is like a massive, rigorous taste test to see which "AI chef" (algorithm) can cook up the best energy plan for these houses. The researchers set up a virtual neighborhood called CityLearn and pitted six different AI strategies against each other to see who could keep the lights on, the bills low, the carbon emissions down, and the residents comfortable.

Here is the breakdown of their findings, explained simply:

1. The Contenders: The "Solo Artists" vs. The "Orchestra"

The researchers tested two main ways the AI agents (the house controllers) could learn:

  • The Solo Artists (DTDE): Each house learns on its own, ignoring what the neighbors are doing. They only look at their own data. Think of this like a group of musicians practicing in separate rooms, trying to guess what the others are playing.
    • Examples: IPPO, SAC.
  • The Orchestra (CTDE): During training, the houses can "listen" to the whole group to learn the big picture, but during the actual game, they still have to play their own instruments alone. Think of this like a conductor helping the musicians practice together, but once the concert starts, the conductor leaves the stage.
    • Examples: MAPPO.

The Winner: The Solo Artists (specifically IPPO) won the day. They were more consistent and reliable. The "Orchestra" approach (MAPPO) was like a high-wire act: sometimes they performed beautifully, but other times they crashed spectacularly. The Solo Artists were the steady, reliable workhorses that never failed.

2. The Secret Weapon: "Memory" (Recurrent Networks)

The researchers also tested if the AIs should have short-term memory.

  • Without Memory: The AI only sees what's happening right now. It's like driving a car while wearing blinders that only show you the bumper in front of you.
  • With Memory: The AI remembers what happened 10 minutes ago. It's like driving with a rearview mirror and knowing the traffic pattern.

The Result: Giving the AIs memory was a game-changer for specific tasks.

  • Ramping (Smoothness): If you suddenly turn your AC on and off, it's like slamming on the brakes. AIs with memory learned to "ease into" the changes, making the power usage smooth and gentle.
  • Battery Health: AIs with memory treated their batteries like a marathon runner, not a sprinter. They learned to drain the battery slowly and steadily, rather than draining it all at once. This makes the battery last much longer.
  • Comfort: Interestingly, memory didn't help much with keeping the house temperature perfect. That's because comfort is about reacting fast to a sudden heatwave, not planning for the future.

3. The "Lazy Neighbor" Test

In a team of six houses, does one house do all the work while the others slack off? The researchers checked this using a metric called Agent Importance.

  • The Finding: No "lazy neighbors" were found! Every house contributed fairly. Even if you removed one house from the simulation, the system didn't collapse. This proves the AI strategies are robust. If a real-world house goes offline or loses internet, the rest of the neighborhood keeps running smoothly.

4. The Trade-offs (The "You Can't Have It All" Reality)

The paper highlights that you can't win every category at once. It's like trying to be the fastest runner, the strongest lifter, and the most flexible gymnast all at the same time.

  • IPPO was the best all-rounder: It kept costs low, emissions low, and comfort high, with very few "bad days."
  • SAC with Memory was the specialist: It could achieve the absolute best scores in specific categories (like minimizing discomfort), but it was less consistent overall.

The Big Picture Takeaway

This paper tells us that for managing complex city energy grids:

  1. Decentralized is better: Letting each building make its own smart decisions (without a central controller micromanaging) is more stable and reliable.
  2. Memory matters: Giving AI a sense of "what happened before" helps it manage batteries and smooth out power usage, which saves money and extends equipment life.
  3. Consistency wins: In the real world, you don't want an AI that is a genius 50% of the time and a disaster the other 50%. You want the steady, reliable performer.

In a nutshell: The researchers found that the best way to run a smart city is to give every house a smart, memory-equipped brain that learns to cooperate without needing a boss, ensuring the lights stay on, the batteries last, and the neighbors stay happy.