The Price of Prompting: Profiling Energy Use in Large Language Models Inference

This paper introduces MELODI, a framework and accompanying dataset designed to monitor and analyze the energy consumption of large language model inference, revealing significant disparities in efficiency based on prompt attributes and highlighting the need for sustainable optimization strategies.

Erik Johannes Husom, Arda Goknil, Lwin Khin Shar, Sagar Sen

Published 2026-03-04
📖 4 min read☕ Coffee break read

Imagine you have a fleet of delivery trucks (Large Language Models, or LLMs) that are constantly driving around to deliver packages (answers to your questions). For a long time, everyone was worried about how much fuel it took to build the trucks in the factory (training the AI). But now that the trucks are on the road, we need to worry about how much fuel they burn every single time they make a delivery (inference).

This paper, titled "The Price of Prompting," is like a new, high-tech fuel gauge and traffic camera system called MELODI. The researchers built it to measure exactly how much energy these AI trucks burn while they are working, down to the very last second of a specific delivery.

Here is the breakdown of their findings using simple analogies:

1. The Problem: The "Black Box" of Energy

Previously, tools to measure energy were like looking at a whole city's power bill. They could tell you how much electricity the whole neighborhood used, but they couldn't tell you how much energy your specific delivery truck used versus the bakery down the street.

  • The Old Way: "The whole computer used 500 watts." (Too vague).
  • The New Way (MELODI): "This specific AI process used 0.0001 watts for 2 seconds." (Super precise).

2. The Big Discovery: Size Matters (A Lot)

The researchers found that the size of the AI model is the biggest factor in fuel consumption.

  • The Analogy: Think of a 70-billion-parameter model as a massive, 18-wheeler semi-truck, and a 2-billion-parameter model as a tiny, efficient scooter.
  • The Finding: The semi-truck doesn't just use a little more gas; it uses 100 times more energy per mile (or per word generated) than the scooter. If you don't need to move a massive load, don't send the semi-truck!

3. The Real Driver: How Long is the Answer?

You might think the question you ask (the prompt) determines how much energy is used. The researchers found this is mostly false.

  • The Analogy: Imagine ordering a pizza. It doesn't matter if you say "I want a pizza" or "I want a delicious, cheesy, pepperoni pizza with extra crust." The kitchen doesn't burn much more gas just because you spoke more words.
  • The Reality: What burns the fuel is how long the pizza takes to bake and deliver (the length of the AI's response).
  • The Finding: The longer the AI talks, the more energy it uses. In fact, the length of the answer is so predictable that the researchers built a math formula that can guess the energy cost with 99.6% accuracy just by knowing how many words the AI will say.

4. The Hardware Trap: Laptops vs. Workstations

The study compared running these AI models on different machines.

  • The Analogy: Running a heavy AI model on a laptop is like trying to pull a heavy trailer with a small sedan engine. It works, but the engine has to scream (work inefficiently), burning more gas to do the same job. A workstation is like a heavy-duty truck built for the job.
  • The Finding: Laptops (especially those without powerful graphics cards) are surprisingly inefficient. They often burn more energy than powerful workstations to do the exact same task.

5. The "Tool" Problem: Why Measurements Vary

The researchers tested their new tool (MELODI) against other popular energy trackers.

  • The Analogy: It's like having four different gas stations measuring your fuel tank. One says you have 10 gallons, another says 5, and a third says 0.5.
  • The Finding: Old tools often measure the whole computer's energy, including background noise (like your email checking itself). MELODI isolates just the AI, giving a much truer picture. They found that some popular tools were wildly inaccurate, either overestimating or underestimating the energy by huge margins.

The Bottom Line: How to Save Energy

If you want to make AI greener and cheaper to run, the paper suggests three simple rules:

  1. Don't use a semi-truck for a scooter job: Pick the smallest AI model that can do the task.
  2. Keep the answers short: If you tell the AI to "be concise," you save massive amounts of energy.
  3. Use the right vehicle: Don't run heavy AI models on weak laptops if you can avoid it; use machines built for the job.

In a nutshell: The paper gives us a precise map of where AI energy goes. It turns out the "price" of prompting isn't about how smart your question is, but how long the AI talks back and how big the engine is that's doing the talking. By measuring this accurately, we can finally start making AI more sustainable.