Resource Allocation for Positive-Rate Covert Communications Using Optimization and Deep Reinforcement Learning

This paper proposes optimization-based and deep reinforcement learning (specifically DDQN) methods to achieve positive-rate keyless covert communications in Rayleigh block-fading channels by solving power and rate allocation problems under both non-causal and causal channel state information scenarios.

Yubo Zhang, Hassan ZivariFard, Xiaodong Wang

Published Tue, 10 Ma
📖 5 min read🧠 Deep dive

Imagine you are trying to send a secret message to a friend across a noisy room, but there's a strict security guard (the "Warden") watching everyone. Your goal isn't just to keep the content of the message secret (like using a code); your goal is to make it so the guard doesn't even realize a conversation is happening at all. This is called Covert Communication.

Usually, there's a catch: if you try to hide your conversation too well, you have to whisper so quietly that the message becomes useless (zero rate). This paper solves that problem by figuring out how to whisper just enough to be heard by your friend, but not enough to trigger the guard's alarm, even when the room is full of static (fading channels).

Here is a breakdown of the paper's ideas using simple analogies:

1. The Setup: The Whispering Game

  • The Players: You (Transmitter), your Friend (Receiver), and the Guard (Warden).
  • The Environment: The room has "bad acoustics" (Rayleigh block-fading). Sometimes the sound carries well; sometimes it gets muffled.
  • The Goal: Send a message with a positive rate (meaning, actually say something useful) without the guard noticing.
  • The Secret: You and your friend know exactly how the acoustics are right now (Channel State Information or CSI). The guard only knows the average noise level, not the specific moment-to-moment changes.

2. The Two Big Challenges

The researchers tackled two main questions:

  1. Power Allocation: "I have a limited battery (power budget). How should I split my energy across different time slots to whisper the most words possible without getting caught?"
  2. Rate Allocation: "I need to whisper exactly 10 words. What is the minimum energy I need to use to do this safely?"

3. Scenario A: The "All-Knowing" Strategy (Non-Causal CSI)

Imagine you are given a script before the game starts. You know exactly how the acoustics will be for the next 10 minutes.

  • The Problem: You have to decide right now how much to whisper in minute 1, minute 2, etc., to maximize your total words.
  • The Solution: The authors created a clever three-step recipe:
    1. Check the Rules: First, check if it's even possible to whisper without getting caught. (If the guard's hearing is always better than your friend's, you can't win).
    2. The Easy Guess: Pretend the hardest rule doesn't exist and solve the easy math problem.
    3. The Fine-Tune: If your "easy guess" breaks the hard rule, use a "penalty" method. Imagine you are walking a tightrope; if you lean too far one way, a spring pushes you back. The computer keeps adjusting your whisper volume until you are perfectly balanced: loud enough for your friend, quiet enough for the guard.

4. Scenario B: The "Live" Strategy (Causal CSI)

Now, imagine you don't have the script. You only know the acoustics right now. You have to decide how much to whisper in the current minute, then wait to see what happens next.

  • The Problem: This is like driving a car in fog. You can't see the whole road, only the patch right in front of you. You need to make decisions on the fly.
  • The Solution (Power Allocation): They turned this into a video game for a computer.
    • They used a technique called Deep Reinforcement Learning (DDQN). Think of this as training a video game AI.
    • The AI plays the game thousands of times. Every time it whispers too loud and gets "caught" (simulated), it loses points. Every time it whispers successfully, it gains points.
    • Eventually, the AI learns the perfect strategy: "When the room is quiet, whisper a bit louder. When the room is noisy, whisper softer."
  • The Solution (Rate Allocation): This was tricky because the math didn't fit the video game format perfectly. So, they used a hack: They took the AI trained to whisper loudly (Power Allocation) and tricked it into whispering specific words (Rate Allocation). It's like using a weightlifter's training program to teach someone how to run a marathon. It's not a perfect fit, but it works surprisingly well.

5. The Results: Who Won?

The researchers ran thousands of simulations to test their methods against older, simpler strategies.

  • The "Trivial" Strategy: Just whispering the same amount all the time. (Like shouting "Hello" at a constant volume).
  • The "Convex" Strategy: A standard math approach that ignores the hardest rules.
  • The Authors' Strategy: The smart, adaptive AI and the three-step recipe.

The Verdict:

  • The authors' methods always won. They managed to send more words with less energy.
  • The advantage was biggest when the Guard had better hearing than the friend (a harder scenario).
  • The "AI" (Deep Reinforcement Learning) was particularly good at handling the "live" scenario where you don't know the future.

Summary

This paper is about teaching a spy how to talk to their contact in a crowded, noisy room without the security guard noticing. They figured out two ways to do it:

  1. If you know the future: Use a smart, step-by-step math recipe to plan your whispers perfectly.
  2. If you only know the present: Train a digital "brain" (AI) to learn the best way to whisper on the fly.

The result is a way to communicate securely and efficiently, even when the odds are stacked against you.