Auction-Based RIS Allocation With DRL: Controlling the Cost-Performance Trade-Off

Imagine a bustling city where two major radio towers (Base Stations) are trying to talk to hundreds of people (Users) in their neighborhoods. The problem? The city is full of tall buildings, and the signals often get blocked, making the connection weak and slow.

To fix this, the city has installed a fleet of "Smart Mirrors" (Reconfigurable Intelligent Surfaces or RISs). These mirrors can catch a signal from a tower, bounce it off a building, and direct it perfectly to a person's phone, bypassing the obstacles.

However, there's a catch: The mirrors don't belong to the towers. They are owned by a neutral third party (like a city utility company) who rents them out. The towers have to compete to borrow these mirrors to help their customers.

Here is how the paper solves the problem of who gets which mirror, using a mix of an auction and a smart AI.

1. The Auction: A "Live Bidding War"

Instead of the city just handing out mirrors randomly or giving them to the first person who asks, they hold a live auction.

The Setup: The auctioneer starts with a low price for a mirror.
The Bidding: The two towers look at the situation and shout, "I want that mirror!" or "No, that one is too expensive for me."
The Rule: If only one tower wants a mirror at the current price, they get it. If both want it, the price goes up a tiny bit, and they have to decide again. This keeps going until the price is high enough that only one tower is willing to pay, or no one wants it anymore.

2. The Problem: How to Bid Smartly?

In the past, towers used simple rules to decide what to bid on.

The "Greedy" Rule: "I will buy every mirror that looks like it might help, as long as I have money left." This often leads to overspending on mirrors that don't actually help much.
The "Distance" Rule: "I will only buy mirrors that are physically close to me." This is simple but ignores whether the mirror actually improves the signal quality.

Both of these methods are like a shopper who buys everything on sale without checking if they actually need it, or only buys things from the store across the street even if the better deal is down the block.

3. The Solution: The "AI Coach" (Deep Reinforcement Learning)

The authors introduced a Deep Reinforcement Learning (DRL) agent. Think of this as a super-smart AI Coach for each radio tower.

Learning by Doing: The AI Coach doesn't just follow a rulebook. It plays the auction thousands of times in a simulation.
Trial and Error:
- If the tower bids too much and runs out of money, the AI gets a "frown" (a penalty).
- If the tower wins a mirror that makes the signal amazing, the AI gets a "thumbs up" (a reward).
- If the tower wins a mirror that barely helps, the AI learns that was a waste of money.
The Result: Over time, the AI learns the perfect balance. It learns to say, "I'll skip that expensive mirror over there because the gain isn't worth the cost," or "I'll fight hard for that specific mirror because it will double our speed."

4. The "Aggressiveness" Dial

One of the coolest features the authors added is a tunable dial (called $\beta$ ) that controls how aggressive the AI is.

Turn the dial down (Low Aggressiveness): The AI becomes a "thrifty shopper." It only buys the absolute best mirrors and refuses to pay extra. The cost is very low, but the network performance is just "okay."
Turn the dial up (High Aggressiveness): The AI becomes a "luxury shopper." It is willing to spend more money to get the absolute best mirrors, resulting in super-fast speeds, but it costs the tower more.

This allows network operators to choose exactly how much they want to spend based on their budget and how fast they need the internet to be.

The Big Picture

The paper shows that by combining a fair auction system with an AI that learns from experience, we can manage these "Smart Mirrors" much better than old-school methods.

Without the AI: Towers waste money or miss out on good connections.
With the AI: Towers get the best possible internet speed for the lowest possible price.

It's like upgrading from a human trying to guess the best price at a flea market to having a supercomputer that knows exactly what every item is worth, ensuring you get the best deal every single time. This is a crucial step toward making our future 6G networks faster, cheaper, and more reliable.

Here is a detailed technical summary of the paper "Auction-Based RIS Allocation With DRL: Controlling the Cost-Performance Trade-Off."

1. Problem Statement

The paper addresses the resource allocation challenge in next-generation (6G) wireless networks involving Reconfigurable Intelligent Surfaces (RISs). Specifically, it focuses on scenarios where multiple Base Stations (BSs) compete for control of shared RIS units deployed at cell edges to enhance signal coverage.

The Conflict: RISs are treated as a shared resource managed by an independent operator. When multiple BSs require the same RIS to improve their links, a mechanism is needed to allocate them fairly and efficiently without permanent assignment.
The Challenge: Traditional combinatorial allocation methods (like VCG auctions) are computationally complex and scale poorly. Furthermore, BSs must make bidding decisions based on limited information (macroscopic channel parameters) rather than perfect Channel State Information (CSI), as RIS configurations are not yet finalized.
The Goal: Develop a scalable allocation mechanism that allows BSs to dynamically lease RISs via an auction, optimizing the trade-off between network performance (spectral efficiency) and economic cost (budget expenditure).

2. Methodology

The proposed framework integrates a market-based auction mechanism with Deep Reinforcement Learning (DRL).

A. System Model

Network Topology: A multi-cell scenario with $N_{BS}$ base stations, $N_{UE}$ users, and $N_{RIS}$ RIS units.
Channel Modeling:
- Direct Links: Modeled as Non-Line-of-Sight (NLOS) with strong shadowing.
- BS-to-RIS Links: Modeled as strong Line-of-Sight (LOS).
- RIS-to-User Links: Modeled using a Rician channel (mix of LOS and NLOS).
Signal Processing: The total channel is a superposition of the direct link and the RIS-assisted link. The RIS phase shifts are optimized to align the coherent components of the signal.

B. Utility Estimation

Since perfect CSI is unavailable during the bidding phase, BSs estimate the potential performance gain using macroscopic channel parameters (path loss, angles, K-factors).

SINR Estimation: The Signal-to-Interference-plus-Noise Ratio (SINR) is approximated using expected values of channel powers, decomposing the signal into direct, coherent RIS-assisted, and incoherent RIS-assisted components.
Utility Function: Defined as the percentage improvement in the sum-rate compared to a scenario with no RISs.
Marginal Value: The value of acquiring a specific RIS is calculated as the marginal increase in utility it provides to the current allocation.

C. Auction Mechanism

The allocation uses a Simultaneously Ascending Auction (similar to a "Japanese" forward auction):

Process: The auctioneer increases the price of all RIS units by a fixed increment ( $\Delta p$ ) in each round.
Bidding: BSs submit binary bids (0 or 1) for each RIS.
Allocation: If a RIS receives a single bid, it is allocated. If multiple BSs bid, the price rises, and bidding continues. If no bids are received, the RIS remains unassigned.
Constraints: An activity rule prevents BSs from dropping out and re-entering the bidding for a specific RIS.

D. Bidding Strategies

The paper compares three strategies:

Greedy Heuristic: Bids on the top $k$ RISs where $k$ is determined by the remaining budget divided by the current price.
Distance-Based Heuristic: Bids based solely on the Euclidean distance between the BS and the RIS (proximity implies utility).
DRL-Based Strategy (Proposed):
- Agent: Each BS operates an independent DRL agent (using Proximal Policy Optimization - PPO).
- Observation: Current price, remaining budget, and estimated marginal values of available RISs.
- Action: A binary vector indicating which RISs to bid on.
- Reward Function: Designed to maximize the value of won RISs ( $R_1$ ) while penalizing the cost of bids ( $R_2$ ) and strictly penalizing budget overruns ( $R_3$ ).
- Tunable Parameter ( $\beta$ ): A "bid intensity" parameter scales the cost penalty, allowing control over the agent's aggressiveness.

3. Key Contributions

Novel Allocation Framework: Proposes a low-complexity, auction-based mechanism for dynamic RIS leasing, avoiding the computational intractability of combinatorial optimization.
Macroscopic Estimation: Demonstrates that reliable utility estimation is possible using only large-scale channel statistics (path loss, angles) rather than instantaneous CSI, making the system scalable.
DRL Integration: Introduces a multi-agent DRL approach where agents learn to bid strategically. The agents learn to identify high-value RISs and avoid wasteful spending, outperforming static heuristics.
Cost-Performance Control: Introduces the bid intensity parameter ( $\beta$ ), which acts as a "knob" for network operators to tune the trade-off between spectral efficiency and operational expenditure.
Empirical Validation: Provides extensive simulations showing that RL-based bidding achieves superior Pareto fronts (higher rates for lower costs) compared to greedy and distance-based baselines.

4. Results

Simulations were conducted in a two-cell scenario with clustered cell-edge users and RISs ( $N_{BS}=2, N_{RIS}=10, N_{UE}=20$ ).

Estimation Accuracy: The macroscopic SINR estimation error decreases as the number of BS antennas increases, validating the approximation method for large arrays.
RL Convergence: The PPO agents converged to stable policies, learning to balance bid value against cost.
Performance Comparison:
- RIS Benefit: Systems with RIS allocation significantly outperformed the "No RIS" baseline.
- RL vs. Heuristics: RL agents achieved higher sum-rates at lower costs compared to greedy and distance-based strategies. Heuristics tended to bid too aggressively, inflating costs without proportional performance gains.
- Impact of $\beta$ :
  - Low $\beta$ : Agents bid aggressively on many RISs (including lower-value ones), leading to higher costs and more allocated RISs.
  - High $\beta$ : Agents become highly selective, bidding only on high-value RISs. This results in lower costs and fewer allocated RISs but maintains high efficiency per unit cost.

5. Significance

This work bridges the gap between theoretical resource allocation and practical deployment in 6G networks.

Scalability: By using macroscopic estimates and a simple ascending auction, the solution avoids the "curse of dimensionality" associated with global optimization.
Economic Viability: The auction model provides a clear economic framework for third-party RIS operators to monetize their infrastructure while ensuring BSs only pay for necessary performance gains.
Adaptability: The DRL approach allows the system to adapt to dynamic environments (changing user locations, interference patterns) without re-engineering the allocation algorithm, simply by re-executing the trained policy.
Future Direction: It highlights the potential of combining game theory (auctions) with AI (DRL) to manage shared physical layer resources in future wireless networks.