Adaptive Personalized Federated Reinforcement Learning for RIS-Assisted Aerial Relays in SAGINs with Fluid Antennas

Imagine a future where your phone never loses signal, whether you are on a mountain peak, in a crowded city, or out at sea. This is the goal of SAGINs (Space-Air-Ground Integrated Networks). Think of it as a three-layered internet:

Space: Satellites zooming overhead like a swarm of bees.
Air: Drones (UAVs) acting as flying cell towers.
Ground: Your devices and the people using them.

The problem? The world is messy. Clouds block signals, buildings reflect them weirdly, and every group of people (a "hotspot") has different needs. Some people have super-advanced antennas that can "wiggle" to find the best signal (called Fluid Antennas), while others have standard ones.

This paper proposes a smart way to manage this chaos using AI, flying drones, and mirrors. Here is the breakdown in simple terms:

1. The Cast of Characters

The Satellites (The Bosses): They are far away and moving fast. They can't talk to everyone directly because clouds or buildings get in the way. They act as the "Global Brain."
The Drones (The Messengers): These are flying cell towers hovering over specific neighborhoods. They carry a special RIS (Reconfigurable Intelligent Surface).
- Analogy: Think of the RIS as a smart mirror on the drone. It can catch the satellite's signal and bounce it perfectly toward the ground, avoiding obstacles.
The Users (The Crowd): Some have "Fluid Antennas" (smart devices that can shift their internal parts to catch the best signal), while others have regular antennas. They are all in different neighborhoods with different crowd sizes and layouts.

2. The Big Problem: "One Size Does Not Fit All"

In the past, AI tried to teach all drones the exact same rules. But that's like trying to teach a surfer in Hawaii and a skier in Alaska the exact same moves using one manual. It doesn't work because the environments are too different.

If the AI is too generic, it fails in specific neighborhoods.
If every drone learns alone, they are slow and inefficient because they don't share what they learn.

3. The Solution: "Personalized Federated Learning"

The authors created a new AI system called FedPG-AP. Let's break down the name with an analogy:

Federated Learning (The Group Study): Imagine a group of students (drones) taking a test. Instead of sending their answers to a teacher to grade (which takes too long and leaks privacy), they keep their answers private. They only send their study notes (the AI model) to a central server. The server mixes all the notes to create a "Master Study Guide" and sends it back.
Personalized (The Customization): This is the magic part. The "Master Study Guide" is too general. So, the system allows each student to keep the parts of the guide that work for their specific subject and swap out the parts that don't.
- Analogy: If Drone A is over a dense city with tall buildings, it keeps the "City Navigation" chapters from the Master Guide but replaces the "Open Field" chapters with its own local experience. If Drone B is over a beach, it does the opposite.
Adaptive (The Chameleon): The system is smart enough to know when to customize. If a drone is struggling, it leans more on its own experience. If it's doing well, it leans more on the group's wisdom. It constantly adjusts the balance.

4. How It Works in Real Life

The Setup: A satellite beams a signal to a drone. The drone uses its "smart mirror" (RIS) to bounce the signal down to users.
The Decision: The drone has to decide: Where should I fly? How should I angle my mirror? Which port should the Fluid Antenna user pick?
The Learning:
- The drone tries a move. If the internet speed goes up, it gets a "reward."
- It updates its own "brain" (local model).
- It sends its brain updates to the satellite (the server).
- The satellite mixes everyone's brains to make a better "Global Brain."
- The satellite sends the Global Brain back, but the drone personalizes it immediately to fit its specific neighborhood before trying again.

5. The Results

The paper ran thousands of simulations (like a video game) to test this.

Without Personalization: The drones were confused and unstable. Sometimes they flew into bad spots.
With Fixed Personalization: They were stable but slow to learn new tricks.
With Their New "Adaptive Personalized" System: The drones learned the fastest, stayed the most stable, and gave the highest internet speeds to everyone, even when the environment was chaotic.

The Takeaway

This paper is about teaching a fleet of flying drones to be smart, cooperative, but also individually adaptable. By using a "Group Study" method where everyone shares what they know but keeps their own "specialty," they can provide perfect internet coverage to everyone, everywhere, regardless of how messy the environment gets. It's the difference between a rigid robot army and a team of flexible, intelligent teammates.

Here is a detailed technical summary of the paper "Adaptive Personalized Federated Reinforcement Learning for RIS-Assisted Aerial Relays in SAGINs with Fluid Antennas."

1. Problem Statement

The paper addresses the challenge of optimizing communication in Space–Air–Ground Integrated Networks (SAGINs). Specifically, it focuses on a scenario where a Low Earth Orbit (LEO) satellite constellation communicates with multiple ground hotspots via Reconfigurable Intelligent Surface (RIS)-assisted Unmanned Aerial Vehicle (UAV) relays.

Key Challenges:

Environmental Heterogeneity: Ground users have diverse reception capabilities; some are equipped with Fluid Antenna Systems (FAS) (which can dynamically select optimal antenna ports), while others are conventional. Additionally, user distributions and activation patterns vary across different hotspots.
Dynamic Dynamics: The system involves high-mobility entities (LEO satellites and UAVs) and time-varying channel conditions (blockage, fading).
Optimization Complexity: The goal is to jointly optimize UAV trajectories and RIS phase shifts to maximize the long-term downlink sum-rate. This is a Mixed-Integer Nonlinear Programming (MINLP) problem due to discrete phase controls and binary FAS port selection, complicated by time-varying random channels.
Limitations of Existing AI: Traditional Deep Reinforcement Learning (DRL) often relies on centralized training, leading to high communication overhead and privacy risks. Standard Federated Learning (FL) assumes homogeneous environments, which fails when local agents (UAVs) face significantly different user distributions and FAS ratios.

2. Methodology

A. System Modeling

Architecture: A hierarchical SAGIN where LEO satellites act as global servers, UAV-RIS relays act as local agents, and ground users (with or without FAS) are the receivers.
Channel Model:
- Satellite-to-UAV (LR): Modeled using Rician fading with a strong Line-of-Sight (LoS) component.
- UAV-to-User (RU): Modeled with Rician fading. For FAS users, the channel matrix accounts for spatial correlation between different antenna ports. The user dynamically selects the optimal port ( $h^*$ ) to maximize channel gain.
Problem Formulation: A long-term sum-rate maximization problem is formulated under constraints for UAV mobility (speed, position) and RIS discrete phase control.

B. Game-Theoretic Analysis

To prove the solvability of the optimization problem, the authors model the interactions as a Hierarchical Stackelberg Game:

Lower Level (UAV-User): A game between the UAV (Leader) and FAS users (Followers). The UAV controls trajectory and phases; users react by selecting optimal ports. The existence of a Nash Equilibrium (NE) is proven.
Upper Level (Satellite-UAV): A game between the LEO satellite (Global Server/Leader) and UAVs (Local Agents). The satellite aggregates policies to guide the UAVs. The existence of an NE is also proven here.
Markov Game: The hierarchical game is reformulated as a Markov Game to facilitate Reinforcement Learning, where states include satellite/UAV/user positions and user activation/FAS status.

C. Proposed Algorithm: FedPG-AP

The core contribution is a Federated Policy Gradient with Adaptive Personalization (FedPG-AP) algorithm.

Federated Framework: UAVs train local policies and exchange model parameters with the satellite (global server) via Inter-Satellite Links (ISLs), preserving data privacy.
Adaptive Personalization (AP): Unlike standard FL which shares a single global model, FedPG-AP dynamically adjusts the network partition between local and global layers based on gradient divergence:
- Mechanism: It calculates the gradient distance between each local agent and a "median" agent.
- Thresholds: Two thresholds ( $\sigma_{close}$ $σ_{c l ose}$ and $\sigma_{far}$ $σ_{f a r}$ ) define a buffer zone.
  - If an agent is too similar to the median ( $d < \sigma_{close}$ ), it adds a local layer (enhancing local specialization).
  - If an agent is too different ( $d > \sigma_{far}$ ), it adds a global layer (enhancing knowledge sharing).
- Dynamic Adjustment: This partition is adjusted every training epoch to adapt to time-varying heterogeneity without requiring fixed network structures.
Training Strategy:
- Local Nodes: Use Policy Gradient (GPOMDP) for local experience.
- Global Node: Uses Stochastic Variance-Reduced Policy Gradient (SVRPG) with a virtual environment constructed from aggregated statistical features of local hotspots to reduce bias and improve convergence.

3. Key Contributions

Unified SAGIN Model: Developed a comprehensive model integrating LEO satellites, RIS-assisted UAVs, and heterogeneous users (FAS and non-FAS), explicitly capturing channel heterogeneity and FAS spatial correlation.
Theoretical Solvability: Formulated the optimization as a hierarchical Stackelberg game, theoretically establishing the existence of a Nash Equilibrium for both the UAV-user and Satellite-UAV interactions.
Adaptive Personalized FRL: Proposed FedPG-AP, a novel algorithm that dynamically adjusts the balance between local and global policy layers based on real-time gradient divergence, solving the "one-size-fits-all" failure of standard FL in heterogeneous environments.
Performance Validation: Extensive simulations demonstrating that adaptive personalization outperforms both non-personalized federated learning and fixed-personalization approaches.

4. Results

Simulations were conducted using a setup with 5 UAVs, 120 RIS elements, and FAS-equipped users (25 ports).

Training Stability: FedPG-AP achieved the highest total reward with the lowest variance across 5 independent runs. In contrast, non-personalized FedPG (FedPG-NP) showed high instability, and fixed-personalization (FedPG-FP) suffered from slower learning speeds.
Convergence: FedPG-AP demonstrated fast initial learning speed (comparable to FedPG-NP) while maintaining the stability of FedPG-FP, avoiding the "worst-case" performance drops seen in other methods.
Downlink Rate: In validation tests (100 random environments), FedPG-AP maintained a consistent average downlink rate of ~725 Kbps, significantly outperforming baselines.
Robustness: FedPG-AP showed the smallest Coefficient of Variation (CV) and Slope Deviation (SD), indicating superior transmission stability and minimal performance degradation over time compared to other methods.
Parameter Sensitivity: Analysis revealed that a balanced configuration of thresholds ( $\sigma_{close}=2.5, \sigma_{far}=3.0$ ) and initial partition ( $e_0=1$ ) yields optimal performance, confirming that a mix of local specialization and global sharing is critical.

5. Significance

6G Enabler: This work provides a critical framework for 6G SAGINs, demonstrating how to integrate emerging technologies (RIS, FAS) with advanced AI (Personalized FRL) to handle extreme environmental dynamics.
Solving Heterogeneity: It addresses a fundamental gap in Federated Learning: how to handle agents operating in vastly different environments (different user densities, FAS ratios) without sacrificing convergence speed or stability.
Practical Deployment: The proposed algorithm does not require additional network infrastructure for personalization; it relies on dynamic parameter inheritance, making it feasible for real-world deployment in satellite-UAV networks.
Energy and Privacy: By using FRL, the approach reduces the need for raw data transmission to the satellite, enhancing security and reducing satellite link load, while optimizing UAV trajectories for energy-efficient coverage.