Entropy-and-Channel-Aware Adaptive-Rate Semantic Communication with MLLM-Aided Feature Compensation

Imagine you are trying to send a high-definition photo of a sunset to a friend, but the internet connection between you is shaky. Sometimes the connection is super fast and clear; other times, it's full of static and drops packets.

Traditional methods of sending this photo are like a stubborn courier who always packs the exact same heavy box, regardless of whether the road is a smooth highway or a muddy dirt path. If the road is good, they waste space carrying unnecessary junk. If the road is bad, the box gets too heavy, parts get lost, and your friend receives a blurry, broken image.

This paper proposes a smart, adaptive courier system that changes its strategy based on the weather (the channel) and the contents of the photo (the semantics). Here is how it works, broken down into simple concepts:

1. The "Smart Courier" (Adaptive Rate Control)

Instead of sending the whole photo at a fixed speed, this system acts like a chameleon.

When the connection is bad (Stormy weather): The system realizes, "We can't carry much right now." So, it packs the box tightly, sending only the absolute most critical parts of the image (like the bright sun and the horizon) and leaving out the less important details (like the texture of a single leaf).
When the connection is good (Sunny day): The system says, "Great, we have plenty of room!" It sends more details, making the picture sharper and more colorful.

This is called Adaptive Rate Control. It saves money (bandwidth) when you don't need it and spends more when you do, ensuring the photo always looks as good as possible for the current conditions.

2. The "Two-Stage Filter" (Entropy and Channel Awareness)

How does the system know what to keep and what to throw away? It uses two clever filters:

Filter 1: The "Big Picture" Check (Feature Map Selection):
Imagine the photo is broken into 100 puzzle pieces. Some pieces show the sun (very important); others show a blurry patch of sky (less important). The first filter looks at the weather and the puzzle pieces, then decides to throw away the entire "blurry sky" piles of puzzle pieces before they even leave the house.
Filter 2: The "Fine-Tuning" Check (Symbol Pruning):
Even the "sun" puzzle pieces might have some extra, redundant pixels. The second filter looks inside the remaining piles and removes the extra, repetitive pixels, keeping only the essential data.

This happens dynamically. The system calculates the "entropy" (a fancy word for how much information or surprise is in a specific part of the image) and the channel quality to make these decisions in real-time.

3. The "Magic Restorer" (MLLM-Aided Compensation)

Here is the most creative part. Because the system throws away so much data to save space, the image arriving at the destination is technically incomplete. It's like receiving a puzzle with 30% of the pieces missing.

In the past, the receiver would just try to guess the missing pieces, often resulting in a blurry mess. But this paper introduces a Super-Intelligent Art Restorer (based on a Multimodal Large Language Model, or MLLM).

How it works: Think of this AI as an art expert who has seen millions of sunsets. When it receives the incomplete puzzle, it doesn't just stare at the gaps; it uses its vast knowledge to reconstruct the missing pieces. It knows that where the sun is, there should be a gradient of orange and yellow, not just a blank space.
The Result: Even though the sender threw away a lot of data, the receiver uses this "AI magic" to fill in the gaps so perfectly that the final image looks almost as good as the original.

4. The "Traffic Light" System (Channel-Aware Loss)

To teach the AI how to behave, the researchers designed a special "scorecard" (a loss function).

If the connection is bad, the scorecard says: "It's okay to send less data, but you must make sure the important parts get through."
If the connection is good, the scorecard says: "Don't waste space! Send the full details, but don't be lazy."

This teaches the system to be a smart resource manager, automatically shifting its strategy to get the best possible picture quality for the least amount of effort.

The Bottom Line

This paper presents a new way to send images over wireless networks that is:

Smarter: It adapts to bad connections by sending less, and good connections by sending more.
Efficient: It throws away redundant data (like a duplicate leaf texture) that humans wouldn't notice anyway.
Resilient: It uses a powerful AI "restorer" at the receiving end to fix the holes left by throwing away data.

The Result: In tests, this system produced clearer, sharper images (higher PSNR) than current state-of-the-art methods, even when using less data. It's like getting a HD movie experience on a slow, spotty internet connection.

Here is a detailed technical summary of the paper "Entropy-and-Channel-Aware Adaptive-Rate Semantic Communication with MLLM-Aided Feature Compensation."

1. Problem Statement

While Semantic Communication (SemCom) offers significant bandwidth efficiency by transmitting only task-relevant information, existing schemes suffer from several limitations:

Fixed Transmission Rates: Most systems operate at a fixed rate regardless of channel conditions or content complexity, leading to resource waste in good channels and performance degradation in poor ones.
Coarse Granularity: Existing adaptive methods often only select important feature maps globally, ignoring that even within "important" maps, many individual symbols are semantically redundant.
Lack of Compensation: Discarded features are rarely compensated for at the receiver, leading to irreversible information loss.
Channel Mismatch: Many models are trained on fixed channel conditions and fail to adapt when actual channel states (CSI/SNR) deviate from training settings.
Discrete Rate Constraints: Many adaptive systems are restricted to a few discrete rate values rather than continuous adaptation.

2. Methodology

The authors propose a novel SemCom framework designed for MIMO Rayleigh fading channels that integrates entropy-and-channel-aware adaptive rate control with Multimodal Large Language Model (MLLM) assistance.

A. System Architecture

The system consists of a Transmitter (Semantic Encoder + Selection/Pruning) and a Receiver (Detection + Compensation + Decoder).

Channel-Aware Semantic Encoder/Decoder:
- Based on a Swin Transformer backbone (SwinJSCC).
- Channel Condition Adaptive Modules (CCAMs) are embedded within the Swin blocks. These modules take a joint embedding of Channel State Information (CSI) and Signal-to-Noise Ratio (SNR) to modulate feature maps, allowing the network to adapt its internal representations to current channel quality.
Joint Feature Map Selection and Pruning (The Core Adaptive Mechanism):
- Policy Network 1 (Feature Selection): Analyzes the input feature maps, their 2D entropy, CSI, and SNR. It outputs a binary mask to select a subset of important feature maps (channel-wise selection).
- Policy Network 2 (Symbol Pruning): Takes the selected maps and the same context inputs. It outputs a symbol-wise binary mask to prune redundant symbols within the selected maps.
- Mechanism: Both networks use Gumbel-Softmax and Thermometer Encoding to ensure differentiable training while producing monotonic masks (prefix-preserving). This allows the receiver to reconstruct the structure using only a single cut-off index (negligible overhead) rather than transmitting the full mask.
MLLM-Aided Feature Compensation (Receiver Side):
- To recover information lost due to channel noise and the aggressive pruning/selection, the receiver employs a lightweight visual encoder from a pre-trained MLLM (InternViT-300M from the InternVL3.5 model).
- LoRA (Low-Rank Adaptation) is used to fine-tune the InternViT efficiently, freezing most parameters and only training low-rank adapters.
- This module takes the distorted, partially received features and the channel embedding to "hallucinate" or reconstruct the missing semantic details, bringing the features closer to the original pre-pruning state.
Channel-Aware Loss Function:
- The training objective combines Reconstruction Loss (MSE), Feature Consistency Loss (ensuring refined features match original features), and a Rate Regularization Loss.
- Crucially, the rate penalty weight is dynamic: it is lower in poor channel conditions (encouraging more transmission for quality) and higher in good conditions (encouraging aggressive compression).

3. Key Contributions

Entropy-and-Channel-Aware Adaptive Rate Control: A novel mechanism that jointly exploits feature maps, their 2D entropy, CSI, and SNR to perform fine-grained adaptive control. It adapts not just which maps to send, but which symbols within those maps, achieving continuous rate flexibility.
Fine-Grained Joint Selection and Pruning: Unlike prior works that only select global features, this method removes redundancy at the symbol level. It uses a dual-policy network approach with minimal side-information overhead (a single cut-off index).
MLLM-Aided Feature Compensation: The first integration of a pre-trained MLLM visual encoder (InternViT) with LoRA fine-tuning to explicitly compensate for discarded features and channel distortion at the receiver, significantly boosting task performance.
Channel-Aware Rate-Distortion Tradeoff: A custom loss function that dynamically balances resource allocation based on channel quality, automatically allocating more resources to poor channels and saving them in favorable ones without sacrificing task performance.

4. Experimental Results

Dataset & Setup: Evaluated on CIFAR-10 images over $2\times2 $and$ 4\times4$ MIMO Rayleigh fading channels.
Benchmarks: Compared against:
- BPG+LDPC: Conventional separation-based source/channel coding.
- SwinJSCC+SA&RA: State-of-the-art (SOTA) adaptive-rate SemCom.
Performance Gains:
- PSNR Improvement: The proposed system consistently outperforms SOTA adaptive-rate methods, achieving 0.4–0.9 dB higher PSNR at similar compression ratios.
- Robustness: It maintains high performance across a wide SNR range (0–20 dB), automatically adjusting the compression ratio (CR) from ~0.26 (poor channel) to ~0.19 (good channel).
- Ablation Study: Removing the InternViT compensation module resulted in a 15–18% increase in required transmission symbols (CR) to achieve similar PSNR, proving the efficacy of the MLLM-based recovery.
- Comparison: Outperforms BPG+LDPC by approximately 2 dB in medium-to-high SNR regimes.

5. Significance

This paper represents a significant advancement in Semantic Communication by addressing the rigidity of existing adaptive schemes.

Efficiency: It demonstrates that semantic redundancy exists at both the feature-map level and the symbol level, and exploiting both yields substantial bandwidth savings.
Resilience: By leveraging the generative/understanding capabilities of MLLMs for feature compensation, the system becomes robust against information loss, a critical factor for real-world deployment in dynamic wireless environments.
Scalability: The use of LoRA for fine-tuning large models ensures that the computational overhead remains manageable, making the approach practical for future 6G and beyond systems.
Paradigm Shift: It moves SemCom from "static compression" to "dynamic, context-aware, and intelligent reconstruction," setting a new standard for rate-distortion performance in wireless image transmission.

Entropy-and-Channel-Aware Adaptive-Rate Semantic Communication with MLLM-Aided Feature Compensation

1. The "Smart Courier" (Adaptive Rate Control)

2. The "Two-Stage Filter" (Entropy and Channel Awareness)

3. The "Magic Restorer" (MLLM-Aided Compensation)

4. The "Traffic Light" System (Channel-Aware Loss)

The Bottom Line

1. Problem Statement

2. Methodology

A. System Architecture

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Adiabatic Capacitive Neuron: An Energy-Efficient Functional Unit for Artificial Neural Networks

Multi-Domain Supervised Contrastive Learning for UAV Radio-Frequency Open-Set Recognition

ACCOR: Attention-Enhanced Complex-Valued Contrastive Learning for Occluded Object Classification Using mmWave Radar IQ Signals

Continuous-Time Analysis of AFDM: Pulse-Shaping, Fundamental Bounds and Impact of Hardware Impairments

Benchmarking Speech Systems for Frontline Health Conversations: The DISPLACE-M Challenge