A Compact Hybrid Convolution--Frequency State Space Network for Learned Image Compression

This paper proposes HCFSSNet, a compact hybrid network for learned image compression that combines convolutional layers with a novel Vision Frequency State Space block to effectively model both local details and long-range dependencies while preserving 2D neighborhood continuity and enhancing frequency awareness.

Original authors: Haodong Pan, Hao Wei, Yusong Wang, Nanning Zheng, Caigui Jiang

Published 2026-04-13
📖 5 min read🧠 Deep dive

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a massive, high-resolution photo album that you want to send to a friend over a slow internet connection. You need to shrink the files (compress them) so they send quickly, but you don't want your friend to receive blurry, pixelated garbage. This is the challenge of Learned Image Compression (LIC).

For a long time, computers used "hand-crafted" rules (like JPEG) to do this. But recently, scientists started using AI to learn the best way to compress images. The problem? The AI models getting the best results are either:

  1. Too heavy: Like a giant truck that takes forever to drive (too much computing power).
  2. Too clumsy: They flatten the 2D photo into a long 1D line to process it, which breaks the natural "neighborhood" of pixels (like cutting a map into a single strip of paper and trying to find your way).

Enter HCFSSNet, the new model proposed in this paper. Think of it as a smart, compact delivery service that uses a hybrid strategy to pack your photos perfectly.

Here is how it works, broken down into simple analogies:

1. The Two-Track Team (The Hybrid Design)

Most AI models try to do everything with one big brain. HCFSSNet splits the work into two specialized teams that work together:

  • The Local Detail Team (CNN): Imagine a microscope. This team zooms in on tiny, specific details like the texture of a cat's fur or the edge of a brick wall. It's great at seeing what's right in front of it but doesn't care about the whole room.
  • The Big Picture Team (State Space Model): Imagine a drone flying overhead. This team looks at the whole scene to understand the context (e.g., "this is a street, so there should be cars and sidewalks"). It connects distant parts of the image efficiently without getting bogged down in math.

The Magic: HCFSSNet combines these two. It uses the microscope for the fine details and the drone for the big picture, ensuring nothing is lost.

2. The "All-Direction" Scanner (VONSS)

Here is the tricky part: How do you scan a 2D photo (a square grid) with a 1D scanner (a line)?

  • Old Way: Imagine reading a book line-by-line (left to right, top to bottom). If you are at the top-left corner and want to talk to the pixel just below you, you have to wait until the scanner loops around. The "distance" in the scanner is huge, even though the pixels are neighbors. This confuses the AI.
  • The HCFSS Way (VONSS): Imagine a security guard checking a room. Instead of just walking in a straight line, the guard checks horizontally, vertically, diagonally, and even backwards.
    • By scanning in all directions (like an "Omni-directional" scan), the AI ensures that pixels that are neighbors in the photo stay neighbors in the data stream. It preserves the natural shape of the image much better.

3. The "Frequency Tuner" (AFMM)

Think of an image like a symphony.

  • Low frequencies are the deep drums and bass (the big shapes, the sky, the background).
  • High frequencies are the cymbals and violins (the sharp edges, the fine textures, the noise).

Traditional AI often treats all notes the same. HCFSSNet has a Frequency Tuner (Adaptive Frequency Modulation).

  • It looks at the image, breaks it down into its musical notes (using a math tool called DCT), and then turns up the volume on the important notes and turns down the volume on the ones that don't matter as much.
  • This allows the AI to be very smart about what data to keep and what to throw away, saving space without losing quality.

4. The "Side Note" Manager (FSTAM)

When you compress an image, you also need to send a "cheat sheet" (called a hyperprior) to help the decoder understand how to unpack the image.

  • Usually, this cheat sheet is just a rough summary.
  • HCFSSNet adds a Frequency Tuner to this cheat sheet too. It makes sure the instructions sent to the decoder are also "frequency-aware," ensuring the decoder knows exactly how to reconstruct the high-frequency details (like sharp edges) when it receives the file.

The Result: A Compact, Efficient Solution

The authors didn't try to build the biggest, most powerful AI possible (which would be slow and expensive). Instead, they built a compact, efficient hybrid.

  • The Analogy: If other top models are like a heavy-duty semi-truck (great capacity, slow, expensive), HCFSSNet is a high-performance sports sedan. It's smaller, uses less fuel (fewer parameters), but still gets you to the destination (high-quality image) just as fast and efficiently for most trips.

In summary:
HCFSSNet is a new way to shrink photos for the internet. It uses a two-team approach (local details + big picture), scans images in all directions to keep neighbors together, and tunes the frequencies of the image to keep only the most important data. It achieves excellent results without needing a supercomputer to run it.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →