Cooperative Deep Reinforcement Learning for Fair RIS Allocation

Imagine a bustling city where two major pizza delivery hubs (Base Stations) are trying to feed hungry customers (Users). One hub is in a wealthy, low-density neighborhood with few customers, while the other is in a crowded, high-density district with three times as many people. Naturally, the crowded hub is overwhelmed, and its customers are waiting a long time for their food.

To fix this, the city installs Reconfigurable Intelligent Surfaces (RIS). Think of these RISs as smart, magical mirrors placed along the street. These mirrors can catch a signal from a pizza hub and bounce it perfectly to a customer's window, even if the direct line of sight is blocked by buildings.

However, there's a problem: There are only 10 mirrors, but they are located right on the border between the two neighborhoods. Both pizza hubs want them because they make deliveries faster. If the crowded hub doesn't get enough mirrors, its customers starve. If the empty hub gets too many, it's a waste of resources.

The Solution: A Smart Auction with a "Fairness Coach"

The authors of this paper propose a system to solve this using two main ingredients: Auctions and AI Learning.

1. The Auction (The Marketplace)

Instead of a central boss deciding who gets which mirror, the hubs participate in a simultaneous auction.

The price of a mirror goes up in small steps.
Both hubs bid on the mirrors they want.
If only one hub bids, they get it. If both bid, they keep fighting until the price gets too high for one of them to care.

2. The Problem with Standard Auctions

In a normal auction, the hub with more money or better strategy might win everything. In our pizza analogy, the empty hub might win all the mirrors just because it's "richer" in the moment, leaving the crowded hub with nothing. This is efficient (mirrors are used) but unfair (some customers go hungry).

3. The AI "Fairness Coach" (Cooperative Deep Reinforcement Learning)

This is where the paper's magic happens. The authors teach the two pizza hubs to act like smart, cooperative agents using Artificial Intelligence (Deep Reinforcement Learning).

Here is how the AI learns to be fair:

The "Fairness Weight": Before every round of bidding, a central computer looks at how well each hub is doing. If Hub A (the crowded one) is struggling, the computer gives it a "Fairness Boost" (a special weight).
The Strategy Change: The AI learns that if it is the struggling hub, it should bid more aggressively because the system wants it to win. If it is the already-successful hub, the AI learns to be more conservative, realizing that "winning" isn't as critical as helping the other guy.
No Talking Required: The hubs don't need to call each other on the phone to coordinate. They just look at the "Fairness Weight" provided by the auctioneer and adjust their bidding strategy automatically.

The Result: A Balanced City

The paper ran simulations to see what happens when they turn up the "Fairness Knob" (a parameter called $\gamma$ ).

Without the Fairness Knob: The hubs fight for mirrors based purely on who can get the most total speed. The crowded hub might still be slow.
With the Fairness Knob: The AI learns to shift mirrors toward the struggling hub.
- The Good News: The customers in the crowded neighborhood get their pizza much faster (their "minimum rate" improves by 34%).
- The Trade-off: The total speed of the whole city drops very slightly (less than 7%).

The Big Picture Metaphor

Think of the network as a team of runners in a relay race.

Old Way: Everyone runs as fast as they can individually. The fast runners finish early and wait, while the slow runners struggle to keep up. The total time is good, but the slow runners are miserable.
New Way (This Paper): The team has a coach (the AI) who tells the fast runners, "Slow down a bit and pass the baton to the slow runner so they can catch up." The fast runner doesn't lose much time, but the slow runner finishes the race much faster. The team's overall time is almost the same, but no one is left behind.

Why This Matters

As we move toward 6G (the next generation of internet), we will have many more devices and "smart mirrors" (RIS). This paper shows that we can use AI and auctions to automatically balance the network. We can ensure that people in bad signal areas get a fair share of the technology, without ruining the internet speed for everyone else. It's a way to make the future internet both fast and fair.

1. Problem Statement

The paper addresses the challenge of resource allocation for Reconfigurable Intelligent Surfaces (RISs) in multi-cell wireless networks, specifically in scenarios with asymmetric user loads (uneven distribution of users across base stations).

Context: As networks evolve toward 6G, RISs are deployed to enhance signal propagation, particularly at cell edges where Line-of-Sight (LOS) is poor. However, RISs near cell boundaries can benefit multiple Base Stations (BSs), creating competition for this shared infrastructure.
The Conflict: In traditional optimization, solving for optimal RIS assignment among competing BSs is computationally complex (combinatorial). Furthermore, standard efficiency-maximizing approaches often neglect fairness, leading to severe performance degradation for "overloaded" cells or users at the cell edge.
Goal: To design a mechanism that dynamically allocates RISs among competing BSs to balance system efficiency (total throughput) and fairness (ensuring weaker-performing cells are not starved of resources).

2. Methodology

The authors propose a framework combining Simultaneous Ascending Auctions with Cooperative Multi-Agent Deep Reinforcement Learning (MARL).

A. System Model

Network: $N_{BS}$ base stations serving $N_{UE}$ users with the help of $N_{RIS}$ surfaces.
Channel Model:
- Direct Link: Modeled as Non-Line-of-Sight (NLOS) Rayleigh fading (strongly shadowed).
- BS-RIS Link: Modeled as Line-of-Sight (LOS) with a strong directional component.
- RIS-User Link: Modeled as Rician fading (LOS + NLOS components).
Signal Processing: BSs beamform toward RISs. If a BS has no RIS, it uses random Gaussian beamforming. The system assumes users are served on orthogonal resources (no intra-cell interference), but inter-cell interference exists.
SINR Estimation: Since instantaneous Channel State Information (CSI) is unavailable at the time of allocation, the system uses macroscopic channel parameters and asymptotic properties of large arrays to estimate Signal-to-Interference-plus-Noise Ratio (SINR) and achievable rates.

B. Auction Mechanism

Format: A Simultaneous Ascending Auction is used.
- An auctioneer sets a uniform price $p_t$ that increases in discrete rounds.
- BSs submit binary bids for available RISs.
- RISs with single bids are allocated; those with multiple bids continue to the next round.
- An activity rule prevents strategic re-entry (a BS must bid in round $t$ if it bid in $t-1$ ).

C. Reinforcement Learning Framework

Each Base Station acts as an autonomous agent learning a bidding strategy via Proximal Policy Optimization (PPO).

State Space: Includes current price, remaining budget, normalized marginal utility of acquiring an RIS, and a fairness weight.
Observation: Agents observe the global state but act locally. Crucially, they receive a fairness weight ( $w_t^{(b)}$ ) computed centrally by the auctioneer based on the relative performance of all BSs.
- $w_t^{(b)} = \frac{(Util^{(b)})^\gamma}{\sum (Util^{(b')})^\gamma} \times N_{BS}$
- The parameter $\gamma$ controls the fairness strength. Higher $\gamma$ gives higher weight to underperforming cells.
Reward Function: Designed to balance utility gain, cost, and fairness.
- Positive Reward: Expected utility gain from winning an RIS.
- Negative Reward (Cost): Monetary cost of bids.
- Fairness Bias: The cost penalties are scaled by the fairness weight $w_t^{(b)}$ . This means stronger-performing cells are penalized more heavily for aggressive bidding, while weaker cells are allowed to bid more aggressively to catch up.
Marginal Utility: Agents estimate the utility gain of acquiring a single additional RIS to avoid combinatorial explosion.

3. Key Contributions

Fairness-Aware MARL for Auctions: The paper introduces a novel method where a centrally computed, performance-dependent fairness indicator is embedded into the agents' observations. This enables implicit coordination without direct inter-BS communication.
Scalable Allocation Mechanism: By using an ascending auction combined with RL, the system avoids the high complexity of combinatorial optimization while handling dynamic, partially observable environments.
Tunable Trade-off: The framework introduces a tunable parameter ( $\gamma$ ) that allows network operators to explicitly control the trade-off between total system throughput and equitable resource distribution.
Macroscopic Estimation: The derivation of SINR and utility estimators based on large-array asymptotics allows for effective decision-making without requiring instantaneous CSI, which is practical for real-world deployment.

4. Simulation Results

The authors evaluated the framework in a two-BS scenario where one BS was overloaded (3x more users) compared to the other.

Efficiency-Fairness Trade-off:
- Increasing the fairness parameter $\gamma$ shifted RIS allocation from the lightly loaded BS to the overloaded BS.
- Result: The minimum user rate in the overloaded cell improved by ~34%, while the total system sum-rate decreased by less than 7%. This demonstrates a highly efficient Pareto frontier.
Fairness Metrics:
- The Atkinson Inequality Index decreased monotonically as $\gamma$ increased, confirming that the distribution of rates became more equal across users.
Allocation Behavior:
- As fairness pressure increased, the number of unallocated RISs decreased, indicating that the "weaker" cell became more aggressive in bidding, effectively capturing resources it previously lost.
Convergence: The RL agents converged to stable policies, learning to balance budget constraints with the need to acquire high-value RISs.

5. Significance and Conclusion

This work provides a significant step toward fair and efficient 6G network management.

Practicality: It moves beyond theoretical centralized optimization to a distributed, market-based approach that is scalable and robust to user mobility and load variations.
Equity in 6G: It demonstrates that "fairness" in wireless networks does not necessarily require a massive sacrifice in total throughput. By using cooperative learning, the system can automatically redistribute resources to the "poorest" performers.
Future Directions: The authors note that while the current model handles moderate-sized networks, future work will explore larger topologies, different auction formats (e.g., sealed-bid), and non-stationary environments with time-varying users.

In summary, the paper successfully proves that cooperative deep reinforcement learning can be effectively integrated into auction mechanisms to solve the complex problem of fair RIS allocation, offering a flexible tool for balancing efficiency and equity in future wireless networks.