M3S-Net: Multimodal Feature Fusion Network Based on Multi-scale Data for Ultra-short-term PV Power Forecasting

This paper proposes M3S-Net, a novel multimodal feature fusion network that integrates multi-scale partial convolutions for fine-grained cloud boundary extraction, FFT-based meteorological analysis, and a dynamic cross-modal Mamba interaction module to achieve state-of-the-art ultra-short-term PV power forecasting with a 6.2% reduction in mean absolute error.

Penghui Niu, Taotao Cai, Suqi Zhang, Junhua Gu, Ping Zhang, Qiqi Liu, Jianxin Li

Published 2026-02-24
📖 4 min read☕ Coffee break read

Imagine you are trying to predict exactly how much electricity a solar farm will produce in the next 10 minutes. This is a tricky game because the sun is usually reliable, but clouds are the ultimate troublemakers. They can drift over the sun, block it, or let just a little bit of light through, causing the power output to jump up and down wildly. If the power grid can't predict these jumps, it can cause blackouts or damage equipment.

This paper introduces a new AI system called M3S-Net (think of it as a "Super-Weather Forecaster") designed to solve this problem. Here is how it works, explained with simple analogies:

The Problem: Why Old Methods Fail

Previous methods were like a one-eyed giant.

  • The Time-Only Eye: Some systems only looked at the history of power numbers (like looking at a speedometer). They could guess the trend, but they couldn't see a cloud coming until it was too late.
  • The Image-Only Eye: Others looked at photos of the sky but treated clouds like simple black-and-white stickers. They saw "cloud" vs. "no cloud," but they missed the details: Is the cloud thin and wispy? Is it thick and dark? Is it moving fast?
  • The "Glue" Problem: When researchers tried to combine these two eyes, they usually just "glued" the data together at the very end. It's like having a driver and a navigator who never talk to each other until they arrive at the destination. They don't work as a team.

The Solution: M3S-Net's Three Superpowers

M3S-Net is different because it has three specialized teams that talk to each other constantly.

1. The "Microscope" Team (Fine-Grained Visual Extraction)

Instead of just seeing "cloud," this team uses a special camera lens (called MPCS-Net) to see the texture of the clouds.

  • The Analogy: Imagine looking at a piece of sheer white fabric vs. a thick wool blanket. Both block light, but differently. This team can tell the difference between a thin, translucent cloud (which lets some sun through) and a thick, dark storm cloud. It ignores the boring blue sky and focuses entirely on the "edges" and "thickness" of the clouds that actually matter for power generation.

2. The "Time-Traveler" Team (Multi-Scale Temporal Imaging)

Power data is a messy mix of long-term trends (like the sun rising and setting) and short-term spikes (like a cloud passing quickly).

  • The Analogy: Imagine listening to a song. You have the slow bass beat (the daily cycle) and the fast drum solo (the cloud passing). This team (SIFR-Net) uses a mathematical trick called FFT to turn the sound wave into a visual picture. This allows the AI to "zoom in" on the fast drums and "zoom out" on the slow bass simultaneously, understanding both the big picture and the tiny details at the same time.

3. The "Telepathic" Team (Cross-Modal Mamba Fusion)

This is the most important part. In old systems, the "Microscope" and the "Time-Traveler" worked in separate rooms. In M3S-Net, they are in the same room and can read each other's minds.

  • The Analogy: Imagine a dance partner swap.
    • Normally, the "Time" dancer moves to the rhythm of the music, and the "Cloud" dancer moves to the shape of the clouds.
    • In M3S-Net, they use a special move called "C-matrix swapping." It's like the Time dancer suddenly borrowing the Cloud dancer's shoes to feel the texture of the floor, while the Cloud dancer borrows the Time dancer's rhythm to know when to move.
    • This creates a deep connection. The system doesn't just say "There is a cloud" and "It is 2 PM." It says, "Because it is 2 PM (Time), and that specific thin cloud (Visual) is moving that way, the power will drop exactly like this."

The Results: Why It Matters

The researchers tested this new system on a brand-new, super-detailed dataset they created (called FGPD), which includes high-quality photos of clouds and weather data.

  • The Score: Compared to the best existing systems, M3S-Net reduced prediction errors by about 6.2%.
  • The Impact: In the world of power grids, a 6% improvement is huge. It means the grid operators can react faster to sudden changes, preventing blackouts and keeping the lights on even when the weather is chaotic.

Summary

Think of M3S-Net as the ultimate solar power crystal ball. It doesn't just look at the sky or the numbers; it uses a microscope to see cloud details, a time-machine to understand patterns, and a telepathic link to combine them perfectly. This allows it to predict the sun's mood swings with incredible accuracy, keeping our solar-powered future stable and reliable.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →