RASLF: Representation-Aware State Space Model for Light Field Super-Resolution

Imagine you are looking at a Light Field (LF) image. Unlike a normal photo that captures just one flat picture, a light field image captures a whole "cube" of information. It records not just what the scene looks like, but also the direction the light is coming from. This allows you to refocus the image later or see it from slightly different angles.

However, there's a catch: to get all this 3D information, cameras have to sacrifice sharpness. The resulting images are often blurry and low-resolution. Light Field Super-Resolution (LFSR) is the art of taking these blurry, low-quality light field images and making them sharp and high-definition again.

The paper introduces a new AI model called RASLF to solve this problem. Here is how it works, explained through simple analogies.

The Problem: The "One-Size-Fits-All" Mistake

Previous AI models tried to fix these blurry images by treating every part of the data the same way. Imagine you are trying to organize a messy library.

The Old Way: You use the exact same sorting rule for books, DVDs, and loose papers. You might sort the books well, but you end up shuffling the loose papers around unnecessarily, wasting time and energy.
The Issue: Light field data has different "views" (spatial details, angles, and geometric lines). Old models used a generic, heavy-handed approach for all of them, leading to blurry textures and misaligned 3D structures.

The Solution: RASLF (The Smart Librarian)

RASLF is a new "Smart Librarian" that knows exactly how to handle different types of data. It uses three main tricks:

1. The "Panoramic Map" (Progressive Geometric Refinement)

The Analogy: Imagine trying to fix a torn map of a city. If you look at just one tiny, ripped piece of the map, you don't know where it fits. You might put a park next to a highway by mistake.
What RASLF does: Instead of looking at tiny, isolated pieces, RASLF creates a Panoramic Epipolar Representation. Think of this as taping all the ripped map pieces together into one giant, continuous panoramic wall.
The Result: Now the AI can see the whole picture at once. It understands exactly how the "parallax" (the way objects shift when you move your head) works across the entire image. This ensures that the 3D structure stays perfectly aligned, preventing objects from looking "jittery" or misshapen.

2. The "Custom Scanner" (Representation-Aware Asymmetric Scanning)

The Analogy: Imagine you are reading a book.
- For a novel (Spatial data), you read left-to-right, then right-to-left to catch details.
- For a train schedule (Epipolar data), the information is strictly vertical (Time vs. Station). Reading it sideways makes no sense and is a waste of time.
What RASLF does: Old models tried to read everything in all directions (left, right, up, down), which is slow and redundant. RASLF is "representation-aware." It knows:
- For Spatial details (textures), it just scans forward (left-to-right).
- For Geometric lines (the train schedule), it only scans along the line where the information actually flows.
The Result: It cuts out the "useless reading." It stops wasting energy scanning directions that don't contain new information. This makes the AI much faster and more efficient without losing quality.

3. The "Dual-Anchor" System (Dual-Anchor Aggregation)

The Analogy: Imagine building a skyscraper.
- The foundation (shallow layers) holds the raw, detailed bricks.
- The roof (deep layers) holds the overall structural design.
- If you just stack floors randomly, the building might wobble or lose its shape.
What RASLF does: It uses two "Anchors" to hold the building together.
- Anchor 1 (Spatial): Keeps the fine details (like the texture of a brick wall) sharp.
- Anchor 2 (Geometric): Keeps the overall shape and 3D alignment perfect.
- It mixes the information from the middle floors carefully into these two anchors, ensuring no detail is lost and no structural integrity is broken.

Why Does This Matter?

The authors tested RASLF against the best existing AI models.

Better Quality: The images it produces are sharper, with better textures and perfect 3D alignment.
Faster & Lighter: Because it stops doing unnecessary work (the "custom scanner"), it runs faster and uses less computer memory than its competitors.

In summary: RASLF is like a master craftsman who doesn't just use a hammer on everything. Instead, it uses a panoramic map to see the whole picture, a custom scanner to only look where it matters, and a dual-anchor system to keep the structure solid. The result is a high-definition, 3D-perfect image created with maximum efficiency.

1. Problem Statement

Light Field (LF) Super-Resolution (LFSR) aims to reconstruct high-resolution spatial details from low-resolution LF data while maintaining strict geometric consistency across multiple views. Current State Space Model (SSM)-based methods face two primary limitations:

Underutilization of Representations: Existing methods often focus on a single LF domain (e.g., Sub-Aperture Images) or fail to explicitly model the structural complementarity between different LF representations (Spatial, Angular, and Epipolar). This leads to the loss of fine textures and geometric misalignments.
Inefficient Scanning Strategies: Current SSM-based LFSR approaches typically apply a uniform, quad-directional scanning strategy (SS2D) across all LF representations. This ignores the inherent physical differences between domains (e.g., the strong directional nature of Epipolar Plane Images vs. the balanced dependencies in Spatial images), resulting in unnecessary computational redundancy and reduced feature focus.

2. Methodology: RASLF Framework

The authors propose RASLF, a representation-aware state-space framework designed to explicitly model structural correlations across multiple LF representations. The architecture consists of three core components:

A. Progressive Geometric Refinement (PGR) Block

The PGR block serves as the backbone for hierarchical feature extraction, processing data through a cascade of three domains:

Spatial (SAI): Refines local textures within sub-aperture images.
Angular (MacPI): Encodes angular distributions and view correlations.
Epipolar (EPI): Enforces geometric consistency.

Key Innovation: Instead of analyzing isolated 2D epipolar slices, the PGR block introduces a Panoramic Epipolar Representation (PEPI). This transforms fragmented observations into a globally coherent geometric space, allowing the model to capture long-range parallax dependencies and structural correlations across the entire light field.

B. Representation-Aware Asymmetric Scanning (RAAS)

To address computational redundancy, RAAS dynamically adjusts scanning paths based on the physical properties of each representation domain:

SAI (Spatial): Uses forward-only scanning (row and column). Since spatial textures have locally symmetric dependencies, backward scanning offers diminishing returns.
MacPI (Angular): Retains quad-directional scanning. The interleaved spatial-angular dimensions require bidirectional context to capture complex cross-view correlations.
EPI (Epipolar): Uses single-direction scanning aligned with the epipolar trajectory (vertical for H-PEPI, horizontal for V-PEPI). Since epipolar lines follow clear directional paths, multi-directional scanning is redundant.
Result: This "path pruning" strategy significantly reduces parameters and FLOPs while preserving geometric constraints.

C. Dual-Anchor Aggregation (DAA) Module

To optimize feature flow and prevent hierarchical redundancy in deep cascaded networks:

Spatial Anchor ( $F_S$ ): Anchored on initial features ( $F_1$ ) to preserve high-frequency spatial textures.
Geometric Anchor ( $F_G$ ): Anchored on final features ( $F_M$ ) to ensure global geometric consistency.
Mechanism: Intermediate features are treated as adaptive refinements, weighted and injected into these two anchors. This filters out deep-layer redundancy and prioritizes critical reconstruction cues.

3. Key Contributions

Panoramic Epipolar Representation & PGR: A novel block that transforms fragmented local constraints into a globally coherent geometric structure, significantly enhancing cross-view consistency.
Representation-Aware Asymmetric Scanning (RAAS): A strategy that tailors scanning paths to the specific physical characteristics of LF domains, reducing computational overhead without sacrificing modeling capacity.
Dual-Anchor Aggregation (DAA): A module that optimizes hierarchical feature propagation, effectively suppressing redundancy while maintaining both spatial accuracy and angular consistency.
State-of-the-Art Performance: RASLF achieves a superior balance between reconstruction quality and inference efficiency, outperforming existing CNN, Transformer, and SSM-based methods.

4. Experimental Results

The model was evaluated on five public benchmarks (EPFL, INRIA, STF-gantry, HCIold, HCInew) for both 2× and 4× super-resolution tasks.

Quantitative Performance:
- 4× LFSR: RASLF achieved the highest average PSNR/SSIM across all datasets. On the challenging STF-gantry dataset (large parallax), it outperformed the previous best SSM method (L2FMamba) by 0.17 dB in PSNR.
- 2× LFSR: It achieved the highest average PSNR among all compared methods.
Efficiency:
- Parameters & FLOPs: RASLF is highly efficient. Compared to L2FMamba at 4× scale, it reduced parameters by 12.8% and FLOPs by 17.9%.
- Inference Time: It offers faster inference than Transformer-based methods and competitive speed against lightweight CNNs, with significantly lower memory consumption.
Ablation Studies:
- Removing the DAA module increased parameters by ~14% and FLOPs by ~21% while lowering accuracy.
- Using isolated EPI slices instead of PEPI resulted in a drop of ~0.1 dB, confirming the importance of global geometric modeling.
- Pruning scanning paths (RAAS) reduced FLOPs by ~13.5% with negligible accuracy loss, validating the efficiency of asymmetric scanning.

5. Significance

RASLF represents a significant advancement in Light Field processing by bridging the gap between geometric priors and efficient deep learning architectures.

Theoretical Impact: It challenges the "one-size-fits-all" scanning paradigm in SSMs, demonstrating that domain-specific physical constraints (like epipolar linearity) should dictate computational strategies.
Practical Impact: By achieving SOTA accuracy with a compact model size and low computational cost, RASLF makes high-quality LF super-resolution more feasible for real-world applications where computational resources are limited (e.g., mobile devices or real-time rendering).
Future Direction: The paper suggests that integrating LF geometric priors directly into the intermediate states of SSMs could further enhance cross-view consistency and texture restoration.