Indirect and Direct Multiuser Hybrid Beamforming for Far-Field and Near-Field Communications: A Deep Learning Approach

This paper proposes a complex-valued end-to-end deep learning framework that eliminates the digital precoder via KKT conditions to enable stable, efficient hybrid beamforming for both far-field and near-field XL-MIMO systems, offering robust performance in both indirect (CSI-based) and direct (pilot-based) modes while significantly reducing complexity and improving spectral efficiency over existing methods.

Xinyang Li, Songjie Yang, Boyu Ning, Zongmiao He, Xiang Ling, Chau Yuen

Published Fri, 13 Ma
📖 5 min read🧠 Deep dive

Imagine you are trying to host a massive, high-stakes dinner party in a giant, circular room (the Base Station) with hundreds of waiters (the Antennas). Your goal is to serve delicious food (data) to many guests (users) sitting at different tables, some very close to the kitchen and some far away in the back.

The challenge? You only have a few head waiters (the RF Chains) to direct the hundreds of waiters. You can't tell every single waiter exactly what to do individually; instead, the head waiters must give general instructions to groups of waiters to point their trays in the right direction.

This is the problem of Hybrid Beamforming in next-generation 6G networks. This paper proposes a brilliant new way to solve it using Artificial Intelligence (Deep Learning).

Here is the breakdown of their solution using simple analogies:

1. The Problem: The "Near-Field" Confusion

In older networks, everyone was far away, so the food (signals) traveled in straight, flat lines (like a laser pointer). You just had to point your tray at the right angle.

But in these new massive systems, some guests are sitting right next to the kitchen. Here, the food doesn't travel in a flat line; it travels in a curved wave (like ripples in a pond).

  • The Issue: If you just point your tray at the right angle, you might miss the guest because they are too close (or too far). You need to aim at the right angle AND the right distance simultaneously.
  • The Noise: With so many guests close together, their voices (interference) mix up. It's hard to hear one person when everyone is shouting at once.

2. The Old Way vs. The New Way

  • The Old Way (Traditional Math): Imagine a head waiter trying to calculate the perfect angle and distance for every single guest using a giant calculator. It's accurate, but it takes forever. By the time they finish the math, the guests have moved, and the food is cold.
  • The "Direct" Way (Current AI): Some AI tries to guess the answer by looking at the guests' faces (pilots) without knowing the full room layout. But often, these AI models get confused by the noise or the complex math, leading to dropped trays.

3. The Paper's Solution: The "Smart Brain" System

The authors built a Deep Learning Brain that acts like a super-intelligent head waiter. It has three special tricks:

A. The "Magic Glasses" (Grouped Convolution Sensing)

Instead of trying to see the whole room perfectly first, the AI puts on a pair of special glasses that scan the room in specific patterns.

  • How it works: It doesn't just look at the guests; it learns how to "listen" to the room's echoes. It figures out where the guests are by how the sound bounces off the walls, even if it can't see them clearly yet. This is the Sensing Front-End.

B. The "Group Chat" (Shared MLP)

Once the AI has a rough idea of where everyone is, it uses a "Group Chat" to organize the waiters.

  • How it works: It realizes that guests sitting in similar spots have similar needs. It groups them together and figures out the best way to serve them all at once without their voices overlapping. This is the Feature Extraction part.

C. The "Instant Recipe" (The KKT Shortcut)

This is the paper's biggest innovation. Usually, AI tries to guess both the angle (analog) and the volume (digital) at the same time. This often leads to the AI getting stuck or confused (unstable gradients).

  • The Trick: The authors realized that once you know the angle (where to point), the perfect volume (digital settings) can be calculated instantly using a simple math formula (like a shortcut recipe).
  • The Result: The AI only needs to learn the "pointing" part. It ignores the "volume" part during training because it can calculate that instantly later. This makes the AI learn much faster and more stably.

4. Two Modes of Operation

The system works in two ways, depending on how much information it has:

  • Mode 1: The "Map Reader" (Indirect)

    • Scenario: You have a perfect map of the room (Perfect Channel State Information).
    • Action: The AI looks at the map, instantly decides where to point the trays, and uses the "Instant Recipe" to set the volume.
    • Benefit: It's incredibly fast and nearly as good as the perfect mathematical solution, but takes a fraction of the time.
  • Mode 2: The "Eagle Eye" (Direct)

    • Scenario: You don't have a map. You only have a few seconds to shout "Hello!" to the guests to see who answers (Short Pilots).
    • Action: The AI uses its "Magic Glasses" to listen to those short shouts. It learns to point the trays directly based on the sound, without needing to build a full map first. Then, it does a quick check to fine-tune the volume.
    • Benefit: This saves a massive amount of time and energy. In a crowded room, shouting less means less noise for everyone else.

5. Why This Matters

  • Speed: It solves the math problem in a blink of an eye, whereas traditional methods take seconds (which is an eternity in 6G).
  • Efficiency: It works even when guests are very close together (Near-Field), a scenario where old methods fail miserably.
  • Robustness: It doesn't get confused by noise or bad weather. It learns the "shape" of the room and adapts.

In a nutshell:
This paper teaches a computer to be a master conductor for a massive orchestra. Instead of trying to write sheet music for every single instrument (which is too slow), the conductor learns to wave the baton in the perfect pattern so that the instruments naturally play the right notes together, even if the musicians are standing right next to the conductor. It's faster, smarter, and handles the chaos of a crowded room better than anything we've had before.