RPT-SR: Regional Prior attention Transformer for infrared image Super-Resolution

The paper proposes RPT-SR, a novel Regional Prior attention Transformer that enhances infrared image super-resolution by fusing learnable regional prior tokens with local content tokens to exploit persistent spatial scene structures, achieving state-of-the-art performance across both Long-Wave and Short-Wave infrared spectra.

Youngwan Jin, Incheol Park, Yagiz Nalcakan, Hyeongjin Ju, Sanghyeop Yeo, Shiho Kim

Published 2026-02-18
📖 5 min read🧠 Deep dive

The Big Problem: The "Amnesiac" AI

Imagine you are a detective trying to solve a crime in a city you've never visited. You have a blurry, low-resolution photo of the scene. To make it clear, you need to guess what the missing details look like.

Most current AI super-resolution models are like amnesiac detectives. Every time they look at a new photo, they treat it as if it's the first time they've ever seen that street. They have to relearn everything from scratch: "Oh, the sky is usually at the top, the road is at the bottom, and buildings are in the middle." They spend a huge amount of mental energy re-discovering these basic facts for every single image, which is inefficient and sometimes leads to mistakes.

This is especially a problem for Infrared Cameras (used in self-driving cars and night surveillance). These cameras often face the same view every day (like a traffic camera on a highway). The layout never changes, but the current AI models don't "remember" this. They are statistically naive, wasting their brainpower on things they should already know.

The Solution: The "Local Guide" and the "Memory Book"

The authors of this paper, RPT-SR, decided to fix this by giving the AI a "cheat sheet" and a "local guide." They created a new system called Regional Prior Attention.

Think of the system as having two distinct types of workers (tokens) working together:

  1. The Memory Book (Regional Prior Token):
    Imagine a permanent, learnable notebook that sits on the desk. This notebook doesn't care about today's specific traffic or weather. Instead, it learns the permanent layout of the scene over time.

    • Analogy: It's like a map of a city that knows, "The highway is always at the bottom of the frame, and the sky is always at the top." It remembers the "skeleton" of the scene.
  2. The Local Guide (Local Token):
    This is a worker who looks at the current blurry photo. They see the specific details: "Today, there is a red truck here, and a pedestrian there." They capture the unique, changing content of the moment.

How They Work Together (The Magic Trick)

In the old models, the AI tried to guess the details using only the blurry photo (the Local Guide). In the new RPT-SR model, the Local Guide and the Memory Book hold hands and talk to each other.

  • The Process: The AI takes the "Local Guide's" observations of the current image and mixes them with the "Memory Book's" knowledge of the scene's layout.
  • The Result: The AI doesn't have to guess where the road is; the Memory Book tells it, "The road is here." This frees up the AI's brain to focus entirely on making the texture of the road and the details of the truck look sharp and realistic.

It's like hiring a local tour guide (Local Token) who knows the current traffic, but giving them a GPS (Regional Prior) that already knows the map. The guide doesn't waste time asking, "Which way is North?" because the GPS already told them. They can just focus on driving smoothly.

Why This Matters for Infrared

Infrared cameras (which see heat or light through fog) are often low-resolution because high-resolution sensors are incredibly expensive. Super-resolution is the software trick to make cheap sensors look like expensive ones.

The researchers tested this on two very different types of infrared light:

  • LWIR (Long-Wave): Sees heat (like a thermal camera).
  • SWIR (Short-Wave): Sees reflected light (like a camera that can see through smoke).

Even though these two types of cameras "see" the world in completely different ways, the RPT-SR model worked perfectly on both. This proves that the model isn't just memorizing heat patterns; it's actually learning the structural rules of the scene (where things usually sit), which applies to almost any fixed-view camera.

The Results

When they tested this new AI against the best existing models:

  • It looked better: The images were sharper, with fewer weird artifacts (like blurry ghosts or ringing edges).
  • It was smarter: It didn't waste energy re-learning the layout of the road or the sky.
  • It was versatile: It worked on both heat-sensing cameras and smoke-penetrating cameras.

In a Nutshell

RPT-SR is a new type of AI that stops trying to re-invent the wheel for every image. Instead, it remembers the permanent layout of the scene (like a map) and combines that memory with the current details. This allows it to turn blurry, low-quality infrared images into crystal-clear, high-definition pictures much faster and more accurately than before. It's the difference between a detective who forgets the map every day and one who has a perfect map in their pocket.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →