ms-Mamba: Multi-scale Mamba for Time-Series Forecasting

This paper introduces ms-Mamba, a novel multi-scale architecture that employs Mamba blocks with varying sampling rates to capture temporal information at different scales, achieving state-of-the-art forecasting performance with greater efficiency than existing Transformer and Mamba-based models.

Yusuf Meric Karadag, Ismail Talaz, Ipek Gursel Dino, Sinan Kalkan

Published 2026-03-06
📖 4 min read☕ Coffee break read

Imagine you are trying to predict the weather for the next week. You look at the temperature, but you realize the data is tricky. Sometimes the temperature changes every hour (a sudden storm), sometimes it shifts over a day (day vs. night), and sometimes it follows a pattern over months (seasons).

If you only look at the data through a single pair of glasses, you might miss the big picture. If your glasses are too zoomed-in, you see every tiny fluctuation but miss the trend. If they are too zoomed-out, you see the season but miss the sudden storm.

This is the problem the paper "ms-Mamba" tries to solve.

The Problem: The "One-Size-Fits-All" Glasses

For a long time, computers used to predict the future (Time-Series Forecasting) using models that looked at data at just one speed.

  • The Old Way (RNNs): Like reading a book one word at a time. Good for stories, but slow and forgetful.
  • The Transformer Way: Like reading the whole book at once to see connections. Very smart, but it gets overwhelmed and slow if the book is too long.
  • The Mamba Way: A new, super-fast model that remembers things well and runs quickly. But, like the others, it usually looks at the data at just one single speed.

The authors realized: Real life isn't one speed. A stock market crash happens in seconds, but a housing market trend takes years. A single-speed model is like trying to watch a movie at 1x speed when you need to see both the slow-motion drama and the fast-paced action scenes simultaneously. It has to compromise, and it loses accuracy.

The Solution: The "Multi-Scale Mamba" (ms-Mamba)

The authors built a new model called ms-Mamba. Think of it as giving the computer three different pairs of glasses to wear at the same time.

  1. Glasses A (High Speed): Looks at the data very closely, catching every tiny, rapid change (like a sudden spike in solar power).
  2. Glasses B (Medium Speed): Looks at the data over a few hours or days, catching daily patterns.
  3. Glasses C (Slow Speed): Looks at the data over weeks or months, catching the big, slow trends.

Instead of forcing the computer to choose one speed, ms-Mamba runs three "Mamba" brains in parallel. Each brain looks at the same data but at a different "sampling rate" (a different speed). Then, it combines the insights from all three brains to make a single, super-accurate prediction.

How It Works (The Analogy)

Imagine a team of detectives trying to solve a mystery (predicting the future):

  • Detective 1 is a speedster who notices every footstep and whisper (high frequency).
  • Detective 2 is a strategist who notices the daily routine of the suspects (medium frequency).
  • Detective 3 is a historian who notices the suspect's habits over the last decade (low frequency).

In the old models, you only had one detective. If you sent the speedster, you missed the long-term plan. If you sent the historian, you missed the immediate clue.
ms-Mamba sends all three. They talk to each other, combine their notes, and produce a prediction that is smarter than any single detective could be alone.

Why Is This a Big Deal?

The paper tested this new model on 13 different real-world datasets, including:

  • Solar Energy: Predicting how much power the sun will generate (which changes instantly with clouds but also follows the seasons).
  • Traffic: Predicting traffic jams (which happen in rush hour but also follow weekly patterns).
  • Electricity: Predicting power usage.

The Results:

  1. It's More Accurate: ms-Mamba beat the current "champion" models (including the famous S-Mamba and Transformer models) in almost every test. On the Solar Energy dataset, it made significantly fewer mistakes.
  2. It's Cheaper: Even though it uses three "brains" instead of one, it actually uses less computer memory and power than the competitors. It's like getting a Ferrari engine that is also more fuel-efficient.
  3. It's Faster: It predicts the future faster than the heavy, slow models.

The Bottom Line

The world is messy and happens at different speeds. The old AI models tried to force everything into a single speed, which led to mistakes. ms-Mamba is like a smart observer who knows how to look at the world through different lenses simultaneously. By doing so, it sees the whole picture clearly, predicts the future better, and does it all without needing a supercomputer to run it.

In short: It's a smarter, faster, and more efficient way for AI to understand the rhythm of time.