Original authors: Mohammad AlShaikh Saleh, Sanjay Chawla, Sertac Bayhan, Haitham Abu-Rub, Ali Ghrayeb

Published 2026-05-07

📖 4 min read☕ Coffee break read

Original authors: Mohammad AlShaikh Saleh, Sanjay Chawla, Sertac Bayhan, Haitham Abu-Rub, Ali Ghrayeb

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

The Big Picture: Predicting the "Electric Hunger" of AI

Imagine a massive data center as a giant kitchen where thousands of chefs (AI computers) are cooking different meals. Sometimes they are making a simple salad (a small task), and sometimes they are roasting a whole turkey (training a giant AI model).

The problem is that these chefs don't eat at a steady pace. They might suddenly decide to cook five turkeys at once, causing the kitchen's power usage to spike wildly. If the power grid (the main electricity supply) doesn't know this is coming, it could get overwhelmed, leading to blackouts or instability.

The authors of this paper built a new "crystal ball" (a forecasting model) to predict exactly how much electricity these AI kitchens will need in the next 5 to 80 minutes. Their secret? They didn't just let the computer guess based on past patterns; they taught it the laws of physics.

The Problem with Old "Crystal Balls"

Most modern prediction tools are like students who only memorize flashcards. If the data looks like the flashcards, they get an A. But if something weird happens—like a chef suddenly turning off the oven because it's too hot (a "throttle" event)—the student gets confused and makes a bad guess.

The paper argues that standard AI models often fail when:

Power Throttling: The computer slows itself down to prevent overheating.
Sudden Spikes: The workload changes instantly.
Recovery: The system tries to stabilize after a spike.

The Solution: "Physics-Aware" DLinear

The authors created a model called PI-DLinear. Think of this as a student who not only memorizes flashcards but also understands how a kitchen works.

1. The Thermal RC Network (The "Hot Pot" Analogy)

The core of their innovation is a set of math equations (ODEs) that describe how heat moves.

The Analogy: Imagine the GPU (the brain of the AI) and the Memory (its short-term memory) are two pots of water sitting on a stove.
The Physics: When you turn up the heat (power), the water gets hotter. But the water doesn't get hot instantly; it takes time. Also, the two pots are sitting next to each other, so heat flows from the hotter pot to the cooler one.
The Innovation: The authors derived new math equations to describe exactly how these "pots" heat up and cool down based on Newton's Law of Cooling. They forced their AI model to obey these rules. If the model predicts that the power will go up, but the temperature is already too high to handle that power, the model "knows" that's impossible and corrects itself.

2. The "Throttle" Rule

The model also learned a specific rule: "If the chef is working at 90% capacity and the pot is boiling, the power must go down."
Standard models might keep predicting high power because the chef was working hard a minute ago. The new model knows that in the real world, safety mechanisms kick in, and it predicts the drop in power accurately.

How Well Did It Work?

The team tested their model on real data from the MIT Supercloud, a massive AI research facility. They compared their "Physics-Aware" model against 16 other top-tier models (including complex ones called Transformers).

Accuracy: The new model was consistently more accurate. It made fewer mistakes, especially when predicting the "spikes" and "drops" in power.
Stability: When the AI workload suddenly changed, the new model recovered its accuracy much faster than the others.
Efficiency: Despite being smarter, the model is actually very lightweight. It's like a compact, high-efficiency car that gets better gas mileage than a massive luxury SUV. It doesn't require a supercomputer to run; it can fit on standard monitoring equipment in a data center.

The Key Takeaways

Don't just guess; understand: By teaching the AI the basic physics of heat and electricity, it becomes much more reliable when things get chaotic.
Safety first: The model is excellent at predicting when a computer will "hit the brakes" (throttle) to save itself from overheating.
Real-world ready: It works on real data from a supercomputer, handling everything from language models to image recognition tasks.

In short, the paper shows that if you want to predict the power needs of a chaotic AI data center, you shouldn't just look at the numbers; you need to understand the heat and the physics behind them.

Technical Summary: A Physics-Aware Framework for Short-Term GPU Power Forecasting of AI Data Centers

1. Problem Statement

AI data centers face unprecedented challenges in power management due to the heterogeneity and rapid fluctuations of computational tasks, particularly Large Language Models (LLMs), vision networks, and Graph Neural Networks (GNNs). Modern AI workloads exhibit high power densities (300–1,200 W per GPU) and transitory power fluctuations that can exceed 132 kW/s at the rack level. These rapid changes threaten grid stability, necessitating accurate short-term power forecasting (5–80 minutes ahead) to inform control strategies like Automatic Generation Control (AGC) and demand response.

While deep learning models, particularly transformers, have advanced time-series forecasting, they often produce physically inconsistent predictions. They struggle with out-of-distribution scenarios, such as power throttling events, abrupt load fluctuations, and post-throttle stability, because they rely solely on statistical patterns rather than underlying physical mechanisms. Furthermore, existing literature lacks time-dependent ordinary differential equations (ODEs) that explicitly interlink GPU power consumption with GPU/memory temperature and utilization, a prerequisite for a truly physics-aware framework.

2. Methodology: PI-DLinear

The authors propose PI-DLinear, a physics-informed variant of the DLinear time-series model. The framework integrates a data-driven forecasting backbone with a physics-based regularization term derived from a multi-node lumped thermal Resistance-Capacitance (RC) network.

2.1 Base Architecture (DLinear)

The foundation is DLinear, which decomposes time-series data into trend and seasonal/remainder components using a moving average kernel. These components are processed by separate linear layers and summed to produce the final forecast. This architecture was selected for its ability to handle clear trends and its computational efficiency.

2.2 Physics-Informed Constraints

To enforce physical consistency, the authors derived new ODEs based on a coupled two-node RC thermal network consistent with Newton's law of cooling. The model treats GPU temperature ( $T_g$ ) and memory temperature ( $T_m$ ) as coupled thermal states.

Thermal RC Model: The system is modeled using energy balance equations where power consumption ( $P$ ) drives temperature changes, and heat dissipation follows Newtonian cooling. The governing equations are:
$C_g \frac{dT_g}{dt} = \alpha P - \frac{T_g - T_a}{R_{ga}} - \frac{T_g - T_m}{R_{gm}}$
$C_m \frac{dT_m}{dt} = (1-\alpha) P - \frac{T_m - T_a}{R_{ma}} + \frac{T_g - T_m}{R_{gm}}$
Where $C$ represents thermal capacitance, $R$ represents thermal resistance, $T_a$ is ambient temperature, and $\alpha$ is a latent power split parameter between GPU and memory.
Power Rate Constraint: By solving the ODEs, a constraint on the rate of power change ($dP/dt$) is derived, linking predicted power trajectories to observed temperature derivatives.
Throttling Constraint: A specific loss component ( $L_{throttle}$ ) is introduced to handle power throttling. Based on observations from the MIT Supercloud dataset, throttling is strongly correlated with sustained high utilization ( $>90\%$ ) rather than just extreme temperatures. The loss penalizes predicted power increases when utilization and temperature exceed specific thresholds, enforcing the physical reality that power must drop or stabilize under high stress.

2.3 Loss Function

The total loss function is a weighted sum of three components:
$L = \lambda_u L_{Data} + \lambda_r L_{r} + \lambda_\theta L_{throttle}$

$L_{Data}$ : Standard Mean Squared Error (MSE) between predicted and actual power.
$L_{r}$ : Residual loss enforcing the RC thermal network ODEs.
$L_{throttle}$ : Constraint loss preventing power increases during high-utilization/throttling regimes.
The weighting parameters ( $\lambda$ ) are optimized using a self-adaptive gradient ascent method in log-space to balance data fidelity and physical constraints.

3. Experimental Setup

Dataset: The model was trained and evaluated on the MIT Supercloud dataset, a publicly available, high-resolution trace (1-minute granularity) from February to October 2021. It includes 100-millisecond logs aggregated to 1-minute intervals covering 448 NVIDIA Volta V100 GPUs.
Workloads: The dataset encompasses diverse AI workloads, including Vision Networks (e.g., U-Net, ResNet), LLMs (e.g., BERT), and GNNs.
Baselines: The proposed model was compared against 16 State-of-the-Art (SOTA) models, including Transformer-based architectures (iTransformer, PatchTST, FEDformer) and non-transformer linear models (DLinear, NLinear, Linear).
Metrics: Performance was evaluated using MAE, MSE, RMSE, and MAPE across various look-back windows (240–600 minutes) and prediction horizons (5–80 minutes).

4. Key Results

Forecasting Accuracy: PI-DLinear consistently outperformed all SOTA baselines. Across all look-back and prediction windows, it achieved improvements ranging from 0.782%–39.08% for MSE, 0.993%–51.82% for MAE, and 0.370%–22.28% for RMSE. Notably, it achieved the lowest MSE and RMSE at every sequence length tested.
Throttling and Transient Recovery: The physics-aware constraints significantly improved performance during critical events.
- Throttle Detection: PI-DLinear improved throttle event detection rates by an average of 6.88%, with a peak improvement of 19.75% at a 360-minute look-back and 10-minute horizon.
- Transient Stability: Under abrupt load fluctuations, PI-DLinear recovered forecasting accuracy more robustly than DLinear (e.g., RMSE of 2.3061 vs. 2.8610 for DLinear).
- Post-Throttle: After throttling subsided, PI-DLinear maintained stable predictions with lower error (MAE: 0.1112 vs. 0.1795).
Efficiency: PI-DLinear maintains the lightweight footprint of the base DLinear model (96k parameters, 0.376 MB memory). While training time increased by approximately 1.9x due to the physics calculations, inference remains efficient. This contrasts sharply with heavier models like FiLM (12.9M parameters) or TiDE, which offered no accuracy gains despite higher computational costs.
Stability: Unlike some transformer models that showed instability with varying sequence lengths (e.g., Crossformer at 360 min), PI-DLinear demonstrated remarkable stability as the history window increased, making it suitable for flexible deployment in data center control units.

5. Significance and Claims

The paper claims to present the first physics-informed DLinear model for AI data center power forecasting that successfully integrates a multi-node lumped thermal RC network. Its primary significance lies in:

Novel Derivation: It is the first work to derive specific time-dependent ODEs coupling GPU/memory power with temperature and utilization to serve as physics-informed constraints, addressing a gap in existing literature where such coupled equations were unavailable.
Physical Consistency: By anchoring learning to real physical mechanisms (Newton's law of cooling and energy conservation), the model ensures predictions respect physical laws, particularly during non-stationary events like power throttling where purely data-driven models fail.
Practical Deployment: The framework offers a superior trade-off between accuracy and computational efficiency. It achieves SOTA performance without the heavy computational burden of complex transformer architectures, making it viable for real-time deployment in data center monitoring and control systems.
Grid Resilience: Accurate short-term forecasting of AI loads is positioned as a critical enabler for grid operators to manage balancing actions, reserve requirements, and frequency regulation, thereby enhancing the resilience of the electricity grid against the volatility of modern AI workloads.

A Physics-Aware Framework for Short-Term GPU Power Forecasting of AI Data Centers