Hard/Soft NLoS Detection via Combinatorial Data Augmentation for 6G Positioning

Imagine you are trying to find a friend's location in a massive, cluttered warehouse using only their voice. You ask several people (let's call them "Listeners") to tell you how far away your friend is based on how loud their voice sounds.

In a perfect world, sound travels in a straight line. If a Listener says, "They are 10 meters away," you draw a circle with a 10-meter radius around them. Where all the circles overlap is your friend.

The Problem: The "Echo" Trap
But this warehouse is full of obstacles—shelves, machines, and walls. Sometimes, the sound bounces off a wall before reaching a Listener. This is called NLoS (Non-Line-of-Sight).

The Lie: The bouncing sound takes a longer path. The Listener thinks, "Wow, they must be 20 meters away!" (when they are actually only 10).
The Result: If you use this "20-meter" circle, your map gets distorted. You might think your friend is in the wrong aisle entirely.

The Solution: The "Combinatorial Detective"
The paper proposes a clever new way to solve this without needing expensive new hardware or a pre-drawn map of the warehouse. They call it CDA-ND (Combinatorial Data Augmentation-guided NLoS Detection).

Here is how it works, broken down into simple concepts:

1. The "Group Chat" Strategy (Combinatorial Data Augmentation)

Instead of trusting just one group of Listeners, the system plays a game of "What If?"

It takes a snapshot of all the distance reports.
It creates thousands of tiny "mini-maps" by mixing and matching different groups of Listeners.
- Map A: Uses Listeners 1, 2, and 3.
- Map B: Uses Listeners 1, 2, and 4.
- Map C: Uses Listeners 2, 3, and 4.
The Magic: If Listener 4 is lying (because of an echo), every map that includes Listener 4 will be slightly "off" in the same wrong direction. The maps that don't include Listener 4 will cluster tightly around the true location.

2. The "Displacement Vector" (The NLoS Evidence Vector)

The system looks at these two clusters of maps:

Cluster A (The Liars): Maps that included the suspect Listener.
Cluster B (The Truthers): Maps that left the suspect out.

If the suspect is lying, Cluster A will be physically shifted away from Cluster B. The system draws an arrow (a vector) between the center of these two groups.

Big Arrow + Pointing the Right Way? = Definite Lie! (The Listener is in NLoS).
Tiny Arrow or Wrong Direction? = Probably Truth. (The Listener is in LoS).

This arrow is called the NLoS Evidence Vector (NEV). It's like a lie detector test that doesn't need a polygraph machine; it just looks at the geometry of the data.

3. Two Ways to Decide: "Hard" vs. "Soft"

The paper offers two ways to use this information:

Hard Decision (The Bouncer):
- The Logic: "Is this Listener a liar? Yes or No?"
- The Action: If the arrow is big enough, the system kicks that Listener out of the group entirely. It ignores their data completely and recalculates the location using only the "honest" Listeners.
- Analogy: Like a bouncer at a club who checks IDs and immediately turns away anyone who looks suspicious.
Soft Decision (The Judge):
- The Logic: "How likely is this Listener a liar?" (e.g., 90% chance of lying, 10% chance of telling the truth).
- The Action: Instead of kicking them out, the system gives them a "trust score." If a Listener is 90% likely to be lying, their data is still used, but it's given very little weight (like a whisper in a crowded room). If they are 90% honest, their data is given a lot of weight (like a shout).
- Analogy: Like a judge who doesn't throw a witness out of court but decides how much of their testimony to believe based on their credibility.

4. Why This Matters for 6G

Current 5G positioning is okay, but 6G needs to be incredibly precise (think centimeter-level accuracy for self-driving cars or robots in factories).

Old Way: You need a massive database of the building's layout or expensive new antennas to figure out where the echoes are.
This Paper's Way: You just need the raw distance numbers from the current moment. The system figures out the "lies" on the fly by comparing the geometry of different listener combinations.

The Results

The authors tested this in simulated factories (some with few obstacles, some packed with them).

In easy environments: It was already very good at spotting liars.
In hard environments (dense factories): The "Soft Decision" method was a game-changer. It improved positioning accuracy by nearly 66% compared to standard methods.

In Summary:
This paper teaches a computer to play "Spot the Liar" by comparing thousands of different map combinations. It doesn't need a pre-made map of the world; it just needs to notice that when a specific sensor is included, the whole picture shifts in a weird direction. By identifying and ignoring (or down-weighting) those "liars," the system can pinpoint a location with amazing accuracy, even in the most chaotic, echo-filled environments.

Here is a detailed technical summary of the paper "Hard/Soft NLoS Detection via Combinatorial Data Augmentation for 6G Positioning."

1. Problem Statement

The paper addresses the critical challenge of Non-Line-of-Sight (NLoS) propagation in 6G positioning systems.

Context: 6G aims for centimeter-level accuracy in complex environments (e.g., smart factories). However, physical obstructions cause NLoS signals to travel longer paths, introducing positive biases in range measurements (Round-Trip Time, RTT).
Limitations of Existing Methods:
- Geometry-based methods: Fail when NLoS biases distort the geometric intersection of range circles.
- AI/ML-based methods: Often require extensive labeled datasets, heavy training, and specific hardware (e.g., XL-MIMO, RIS), making them impractical for cost-effective, real-time deployment.
- Site-survey dependency: Many solutions rely on pre-existing environmental maps, which are difficult to maintain in dynamic settings.
Goal: Develop a technique to detect NLoS links and improve positioning accuracy using only real-time range measurements from a single snapshot, without requiring additional hardware or extensive prior knowledge.

2. Methodology: CDA-ND

The authors propose Combinatorial Data Augmentation-guided NLoS Detection (CDA-ND). The core idea is to exploit the spatial statistics of "Preliminary Estimated Locations" (PELs) generated by combining different subsets of base stations (gNBs).

A. Core Mechanism: Combinatorial Data Augmentation (CDA)

PEL Generation: Instead of using all $N$ gNBs at once, the algorithm generates $\binom{N}{M}$ (where $M=3$ ) subsets of gNBs.
Multilateration: For each subset, a preliminary location ( $x^{(\ell)}$ ) is calculated using standard multilateration.
Spatial Clustering:
- LoS Subsets: PELs form a tight, isotropic cluster near the true user equipment (UE) position.
- NLoS Subsets: If a specific gNB is in NLoS, PELs constructed with that gNB shift directionally away from the true position (opposite to the gNB), while PELs constructed without it remain clustered.
NLoS Evidence Vector (NEV): The algorithm computes the displacement vector ( $r_n$ ) between the median of PELs using gNB $n$ and the median of PELs excluding gNB $n$ . This vector serves as the primary feature for detection.

B. Hard Decision (HD) Mode

Objective: Binary classification (LoS vs. NLoS) for each gNB.
Scoring: A score $\rho_n$ $ρ_{n}$ is calculated based on:
1. Directional Alignment: The projection of the NEV onto the vector pointing from the gNB to the pseudo-UE location.
2. Magnitude: The length of the NEV, scaled by the square root of the range distance (as NLoS probability increases with distance).
Thresholding: An adaptive threshold is applied to $\rho_n$ to classify gNBs. The threshold balances Recall (detecting all NLoS) and Precision (avoiding false alarms) based on the distribution of scores.
Positioning: GNBs classified as NLoS are excluded. The remaining gNBs undergo Residual-Error (RE) and RTT-Sum (RS) filtering to remove outliers before calculating the final position as the median of the remaining PELs.

C. Soft Decision (SD) Mode

Objective: Probabilistic quantification of NLoS confidence (posterior probability).
Weak Priors: Utilizes minimal site-survey data (empirical score distribution and average NLoS probability) to train a mapping function.
Gaussian Mixture Model (GMM): Fits the score distribution to a mixture of LoS and NLoS components to derive a sigmoid-like mapping function $h(\rho)$ that converts scores to probabilities.
Iterative Refinement:
1. Initial HD scores are used to estimate PEL reliability.
2. PELs are re-weighted based on the reliability of their constituent gNBs.
3. Representative points (medians) are recalculated using weighted medians.
4. Scores are refined recursively until convergence.
Positioning: The final position is a weighted median of the filtered PELs, where weights are determined by the product of the LoS probabilities of the gNBs used to generate each PEL.

3. Key Contributions

CDA-Induced Discriminative Statistics: Demonstrated that the spatial distribution of PELs inherently encodes site-dependent propagation characteristics, allowing NLoS detection without channel state information (CSI) or hardware modifications.
NLoS Evidence Vector (NEV): Introduced a novel geometric feature that captures both the magnitude and direction of NLoS bias, enabling a tractable Hard Decision detector.
Soft Decision Framework: Developed a probabilistic approach that leverages weak site-survey priors to refine scores and re-weight PELs, significantly improving robustness in dense NLoS environments.
Integrated Positioning Algorithms: Designed specific positioning pipelines (CDA-ND-RERS) that combine NLoS detection with residual-error and RTT-sum filtering, tailored for both HD and SD modes.

4. Experimental Results

The method was validated using the 3GPP-compliant 3D indoor factory dataset (InF-SH and InF-DH scenarios) across Frequency Range 1 (FR1) and FR2.

NLoS Detection Accuracy:
- InF-SH (LoS-dominant, 18% NLoS): Achieved 96.6% accuracy (SD) and 94.7% (HD) in FR1.
- InF-DH (NLoS-dominant, 56% NLoS): Achieved 91.1% accuracy (SD) and 78.0% (HD) in FR1.
- Key Insight: The SD mode significantly outperforms HD in dense NLoS environments by reducing missed NLoS detections from ~26% (HD) to ~2% (SD).
Positioning Performance (Mean Absolute Error - MAE):
- InF-SH (FR1): Reduced MAE from 8.07m (Standard LS) to 0.48m (CDA-ND-RERS SD).
- InF-DH (FR1): Reduced MAE from 23.06m (Standard LS) to 1.35m (CDA-ND-RERS SD).
- Improvement: The proposed method achieved 20.04% and 65.99% reductions in MAE for LoS- and NLoS-dominant environments, respectively, compared to standard approaches.
Robustness: The SD approach showed superior performance in FR2 (28 GHz) and dense obstacle scenarios where noise and bias are more severe.

5. Significance

Practicality: The solution operates on standard range measurements (RTT) available in current 5G/6G networks, requiring no new hardware (like RIS or movable antennas) or massive labeled datasets.
Scalability: It is suitable for dynamic environments (e.g., smart factories) where site surveys are difficult to maintain, as it relies on "weak priors" or can function in Hard Decision mode without any prior data.
Performance: It bridges the gap between geometry-based and AI-based methods, offering high accuracy with low computational overhead and interpretability.
6G Enabler: By effectively mitigating NLoS errors, this technique is a key enabler for meeting the stringent positioning accuracy requirements of 6G applications, such as autonomous robotics and industrial automation.