Partial Ring Scan: Revisiting Scan Order in Vision State Space Models

This paper introduces PRISMamba, a rotation-robust Vision State Space Model that improves accuracy, efficiency, and geometric stability by replacing fixed linear scan orders with a concentric ring-based traversal and partial channel filtering.

Original authors: Yi-Kuan Hsieh, Kuan-Chuan Peng, Xin li, Ming-Ching Chang, Yu-Chee Tseng, Jun-Wei Hsieh

Published 2026-06-17
📖 5 min read🧠 Deep dive

Original authors: Yi-Kuan Hsieh, Kuan-Chuan Peng, Xin li, Ming-Ching Chang, Yu-Chee Tseng, Jun-Wei Hsieh

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you are trying to describe a complex picture, like a photo of a dog chasing a rabbit, to a friend over a phone line. You can only send one word at a time. The order in which you describe the picture matters immensely.

If you describe the picture row by row (left to right, top to bottom), you might say, "Dog's ear... dog's nose... rabbit's ear... dog's tail." If you rotate the picture 90 degrees, your row-by-row description suddenly becomes a mess. The "dog's ear" is now far away from the "dog's nose" in your list, and the rabbit's ear might end up sandwiched between the dog's tail and the background. Your friend (the computer model) gets confused because the things that belong together in the picture are now far apart in your story.

This paper, titled "Partial Ring Scan: Revisiting Scan Order in Vision State Space Models," argues that the way we "read" images in modern AI is too rigid. The authors propose a new, smarter way to read images that stays calm even when the picture spins around.

Here is the breakdown of their ideas using simple analogies:

1. The Problem: The "Snake" vs. The "Target"

Current AI models (like VMamba) often read images like a snake slithering across a grid. They go left-to-right, then drop down, then go right-to-left.

  • The Issue: If you rotate the image, the "snake" path gets cut off. The AI has to jump across empty space (padding) or jump to the wrong part of the picture. It's like trying to read a book where someone rotated the page; the sentences no longer make sense because the words are in the wrong order.
  • The Result: When the image rotates, the AI's performance drops because it loses track of how objects connect.

2. The Solution: The "Onion" Strategy (Ring Scan)

The authors, Yi-Kuan Hsieh and colleagues, suggest a different way to read the picture: The Ring Scan.

Imagine the image is an onion or a target board.

  • Step 1: Instead of reading row by row, the AI peels the image in concentric rings, starting from the very center and moving outward.
  • Step 2: Inside each ring, the AI doesn't care about the specific order (clockwise or counter-clockwise). It just gathers all the information from that ring together.
  • Step 3: It then moves from the inner ring to the next ring, and so on.

Why this is a game-changer:
If you rotate the picture, the "onion" doesn't change. The center is still the center, and the outer ring is still the outer ring. The AI doesn't have to relearn the order; it just sees the same rings, just rotated slightly. This makes the AI incredibly robust against rotation.

3. The Efficiency Hack: The "VIP Lane" (Partial Channel Filtering)

Processing every single piece of information in the image is expensive and slow. The authors noticed that not all "channels" (think of these as different colored filters or types of data) are equally important. Some are noisy or redundant.

  • The Old Way: The AI tries to process everything through the complex "Ring" path.
  • The New Way (Partial Channel Filtering): The AI acts like a bouncer at a club. It quickly checks which channels are "VIPs" (the most informative ones).
    • VIPs get sent through the main, high-speed "Ring" path.
    • Regulars (the less important data) are sent down a simple, lightweight "backdoor" path (a residual branch).

The Analogy: Imagine a busy highway. Instead of forcing every car to take the long, winding scenic route, the smart system lets only the fast sports cars take the scenic route while the slow trucks take a direct, simple bypass. The result? The highway moves faster, and the important cars still get the detailed view they need.

4. The Results: Faster, Smarter, and Stronger

The team tested their new model, called PRISMamba, against the current leaders (like VMamba).

  • Accuracy: On a standard test (ImageNet), PRISMamba got 84.5% accuracy, beating the previous best (VMamba) which got 82.6%.
  • Speed: It processed images much faster (3,054 images per second vs. 1,686 for VMamba).
  • Efficiency: It used less computing power (3.9G FLOPs vs. 5.6G).
  • Rotation: When they rotated the images by 60 degrees, the old models dropped in performance by about 2%. PRISMamba barely noticed; its score stayed almost exactly the same.

Summary

The paper claims that by changing how the AI reads the image (from a snake-like path to a ring-based path) and by filtering out the boring data (Partial Channel Filtering), they created a model that is:

  1. More accurate (it sees better).
  2. Faster (it thinks quicker).
  3. More robust (it doesn't get confused when the picture spins).

They call this a "low-cost, principled approach," meaning they didn't need to build a massive, expensive new brain; they just rearranged the furniture and opened a VIP lane.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →