Trajectory-aware Shifted State Space Models for Online Video Super-Resolution

This paper proposes TS-Mamba, a novel online video super-resolution method that leverages trajectory-aware token selection and shifted State Space Models to achieve state-of-the-art performance with significantly reduced computational complexity by effectively modeling long-range temporal dependencies and spatial continuity.

Qiang Zhu, Xiandong Meng, Yuxian Jiang, Fan Zhang, David Bull, Shuyuan Zhu, Bing Zeng, Ronggang Wang

Published 2026-02-25
📖 4 min read☕ Coffee break read

Imagine you are trying to watch a live stream of a soccer game, but the connection is bad, and the video is blurry and pixelated. You want to see the players clearly, but you can't wait for the whole game to finish downloading to fix it; you need the picture to get better right now, as the game is happening. This is the challenge of Online Video Super-Resolution (VSR).

The paper you shared introduces a new AI model called TS-Mamba that solves this problem. Here is how it works, explained with some everyday analogies.

The Problem: The "One-Neighbor" Limit

Most existing video enhancers are like a person trying to fix a blurry photo by only looking at the one picture immediately before it.

  • The Analogy: Imagine you are trying to guess what a person in a crowd is doing. If you only look at the person standing right next to them, you might miss the fact that they are waving at someone three rows back.
  • The Issue: Old methods only look at the "immediate neighbor" frame. They miss the long-term context (like a player running from the other side of the field), which makes the final image look a bit shaky or incomplete.

The Solution: TS-Mamba (The "Trajectory Detective")

The authors created a new system called TS-Mamba. Think of it as a super-smart detective that doesn't just look at the person next to you, but tracks the entire path the person has taken.

Here are the three main tricks TS-Mamba uses:

1. Drawing the "Path" (Trajectory Awareness)

Instead of just grabbing the frame before the current one, TS-Mamba draws invisible lines (trajectories) across the video to see where objects have been moving over time.

  • The Analogy: Imagine a game of "Connect the Dots." Instead of just looking at the dot right next to the current one, the AI draws a line back through the last 15 dots to see the full curve of the movement.
  • The Result: It finds the most similar "pieces" (tokens) from the past that match the current moment, even if they were far away in time. It picks the best clues to reconstruct the picture.

2. The "Shifted" Scanner (Fixing the Broken Puzzle)

The AI uses a technology called Mamba (a type of State Space Model) which is incredibly fast and efficient. However, Mamba has a quirk: it reads images like a snake slithering through a grid (called a Hilbert scan).

  • The Problem: When a snake slithers, it sometimes jumps from the bottom of one block to the top of the next, breaking the flow. It's like reading a book where the sentences are cut in half and the second half is on a different page. This causes "spatial discontinuity" (the image looks a bit chopped up).
  • The Fix: The authors invented "Shifted SSMs." Imagine you are reading that book, but every time the sentence breaks, you shift the page slightly before reading the next part. This ensures the sentence flows smoothly.
  • The Analogy: It's like a construction crew that notices a gap in a brick wall. Instead of just laying more bricks, they slide the whole row over slightly to fill the gap perfectly, making the wall solid and continuous.

3. The "Smart Loss" (The Teacher's Red Pen)

To make sure the AI draws the paths correctly, the authors created a special "loss function" (a way to grade the AI's homework).

  • The Analogy: Usually, teachers only grade the final essay. Here, the teacher also grades the outline the student drew before writing. If the outline (the trajectory) is wrong, the essay (the video frame) will be messy. This forces the AI to learn how to track movement accurately from the very beginning.

Why is this a Big Deal?

  • Speed vs. Quality: Usually, you have to choose between a fast video (low quality) or a high-quality video (slow, laggy). TS-Mamba is like a Formula 1 car that also gets 100 miles per gallon. It is incredibly fast (real-time) but produces the highest quality picture.
  • Efficiency: It uses 22.7% less computing power than the current best methods. This means it can run on your phone or laptop without draining the battery or overheating the device.

The Bottom Line

TS-Mamba is a new way to make live video look crystal clear. It does this by:

  1. Tracking movement over a long period (not just the last second).
  2. Smoothing out the reading process so the image doesn't look chopped up.
  3. Doing it all very quickly so you don't have to wait.

It's a major step forward for live streaming, video calls, and watching sports, ensuring that even with a shaky internet connection, you still get a sharp, clear picture.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →