Enhancing User Throughput in Multi-panel mmWave Radio Access Networks for Beam-based MU-MIMO Using a DRL Method

This paper proposes a deep reinforcement learning-based adaptive beam management framework that leverages spatial domain characteristics and real-time observations in multi-panel mmWave networks to significantly enhance user throughput by up to 16% and reduce latency by 3–7 times compared to legacy methods.

Ramin Hashemi, Vismika Ranasinghe, Teemu Veijalainen, Petteri Kela, Risto Wichman

Published 2026-03-04
📖 5 min read🧠 Deep dive

Imagine you are the conductor of a massive, high-speed orchestra, but instead of violins and drums, your instruments are radio waves traveling at the speed of light. This is the world of mmWave (millimeter-wave) communication, the technology behind the super-fast 5G networks of the future.

However, there's a problem: these radio waves are like fragile whispers. They can't travel far or go through walls easily. To make them loud enough, we use antennas that act like giant flashlights, focusing the signal into tight beams to hit specific users.

The Problem: The "Flashlight" Dilemma

In a busy city, you have many users (let's call them "Mobile Terminals" or MTs) trying to talk to the network at the same time. Your base station (the "gNB") has multiple panels of antennas, each with its own set of "flashlights" (beams).

The Old Way (The Legacy Approach):
Imagine a traffic cop who only looks at who is standing closest to the intersection. The old system simply picks the beam with the strongest signal (the loudest whisper) for every user.

  • The Flaw: Just because a signal is loud right now doesn't mean it's the best choice for the whole group. If you pick the loudest beam for everyone, you might accidentally point two beams at the same spot, causing them to crash into each other (interference). Or, you might pick a beam that is loud but rarely used, leaving other users waiting in line. It's like a chef who only cooks the most popular dish, ignoring that the kitchen is running out of ingredients for the other 99% of customers.

The Solution: The "Smart Conductor" (DRL)

This paper introduces a Deep Reinforcement Learning (DRL) system. Think of this as a Smart Conductor who doesn't just look at who is closest, but learns from experience to manage the whole orchestra perfectly.

The Smart Conductor looks at three things before deciding which "flashlight" to turn on:

  1. Signal Strength (RSRP): Is the signal loud? (The obvious choice).
  2. Popularity (Beam Usage): Has this beam been used a lot lately? If a beam is "popular," it means the network is already good at handling traffic on that path. Switching to a new, unused beam might cause a delay while the system figures it out.
  3. Compatibility (Cross-Correlation): This is the magic part. Imagine you have two people trying to talk on walkie-talkies. If they stand too close together, their voices overlap and become noise. The Smart Conductor checks the "spatial relationship" between beams. It asks: "If I turn on Beam A for User 1, will it interfere with Beam B for User 2?" If they are compatible, it schedules them together. If not, it picks a different pair.

How It Learns (The Video Game Analogy)

How does the computer learn to be this smart? It plays a video game.

  • The Goal: Maximize the total data (throughput) sent to everyone and minimize the time they wait (latency).
  • The Trial and Error: At first, the AI makes random choices. Sometimes it picks the wrong beam, and the data rate drops (it loses points). Sometimes it picks a great combination, and the data flies (it gains points).
  • The Reward: Every time the network runs smoothly, the AI gets a "reward." Over thousands of tries, it learns a policy: "When I see this pattern of signals and this history of usage, I should always pick this specific combination of beams."

The Results: Why It Matters

The paper tested this "Smart Conductor" against the old "Traffic Cop" method in a simulated city with 210 users. The results were impressive:

  • Faster Speeds: The network delivered up to 16% more data to users. That's like getting a faster download speed on your phone without changing your plan.
  • Less Waiting: The time it takes for a message to go from your phone to the tower and back (latency) was reduced by 3 to 7 times.
    • Analogy: If the old system made you wait 7 seconds for a webpage to load, the new system makes it load in 1 second.
  • Smarter Grouping: The AI learned to group users together more efficiently. Instead of serving them one by one, it found ways to serve multiple users simultaneously without them interfering with each other, much like a bus driver who realizes they can pick up three people on the same side of the street without making a detour.

The Bottom Line

This paper shows that by teaching a computer to learn from the environment rather than just following rigid rules, we can make our 5G networks significantly faster and more efficient. It's the difference between a robot that blindly follows a map and a human driver who knows the shortcuts, the traffic patterns, and how to navigate the city smoothly.

In short: The old way was "Pick the loudest signal." The new way is "Pick the smartest combination of signals to keep everyone happy and moving fast."

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →