Multi-Domain Supervised Contrastive Learning for UAV Radio-Frequency Open-Set Recognition

Imagine a busy airport, but instead of planes, the sky is filled with hundreds of tiny, buzzing drones. Some are delivering pizza, some are filming movies, and some are just flying around for fun. But then, there are the "bad actors"—drones that are spying on people, stealing data, or flying where they shouldn't.

The problem? The airport security (the government) has a list of "good" drones they know about. But they don't know what the "bad" ones look like because bad guys often modify their drones to hide their identity. Traditional security cameras (vision) fail in the dark or fog, and microphones (sound) get drowned out by traffic noise.

This paper proposes a new kind of security guard that listens to the radio whispers of the drones. Here is how it works, explained simply:

1. The "Radio Fingerprint" (The Core Idea)

Every drone has a unique radio signal, like a fingerprint. Even if two drones look identical, their radio signals have tiny, invisible differences in how they transmit data.

The Challenge: These signals are messy. They jump around (frequency hopping) and change constantly (non-stationary), making them hard to read.
The Solution: The authors built a system called Open-RFNet. Think of it as a super-smart detective that doesn't just look at the signal; it looks at the signal's texture (the pattern of the waves) and its position (where the waves sit in time and frequency).

2. The Two-Brain System (Multi-Domain Learning)

To understand these messy signals, the system uses two different "brains" working together:

Brain A (The Texture Expert): Uses a ResNet (a type of AI good at spotting patterns in images) to look at the "shape" of the signal. It's like looking at the grain of a piece of wood to tell if it's oak or pine. This helps ignore the noise and static.
Brain B (The Position Expert): Uses a Transformer (the same tech behind chatbots like me) to understand the timing and location of the signal patterns. It's like noticing that a specific drumbeat always happens exactly 3 seconds after a cymbal crash.

The Magic: The system fuses these two brains together. It's like having a detective who can both read the handwriting and analyze the ink's chemical composition. This makes it incredibly hard to fool.

3. The "Training Camp" (Supervised Contrastive Learning)

Usually, AI learns by trying to guess the right answer and getting a "thumbs up" or "thumbs down."

The Old Way: The AI tries to memorize the answer.
The New Way (Supervised Contrastive Learning): The AI is taught to group similar things together and push different things apart.
- Analogy: Imagine a dance floor. The AI is the DJ. It makes all the "DJI Phantom" drones dance in one tight circle, and all the "DJI Mavic" drones dance in a different circle. It pushes the "bad guy" drones far away to the edge of the room. This creates a very clear map of who belongs where.

4. The "Imposter Detector" (Open-Set Recognition)

This is the most important part. Most AI systems are "closed-minded." If you show them a new type of drone they've never seen, they will guess it's one of the known types and get it wrong.

The Goal: The system needs to say, "I don't know what this is, but it's definitely not on my list of good drones."
The Trick (IG-OpenMax):
1. The system first learns all the "good" drones perfectly.
2. Then, it uses a "fake generator" (a GAN) to create fake unknown drones. It's like a forger creating fake IDs to test the security guard.
3. The Secret Sauce: Instead of retraining the whole system (which would mess up the memory of the good drones), the authors froze the "brain" (the feature extractor) and only retrained the "decision maker" (the classification layer).
4. Analogy: Imagine a librarian who knows every book in the library. Instead of rebuilding the whole library to learn about new books, they just put a new sign on the door that says, "If you don't fit on these shelves, you go in the 'Unknown' bin." This keeps the library organized while allowing new, unknown items to be caught.

5. The Results

The team tested this on a massive dataset of 25 different drone types.

Closed-Set (Known Drones): It got 95.12% right.
Open-Set (Unknown/Intruder Drones): It got 96.08% right.
The Balance: Most systems are good at one but bad at the other. This system is excellent at both, with almost no difference in performance.

Why This Matters

In the future, our skies will be crowded. This technology acts as an invisible shield. It can spot a spy drone even if the spy has changed the drone's software or is flying in the dark. It doesn't need to see the drone; it just needs to hear its radio "voice," and it knows exactly who is telling the truth and who is lying.

In short: They built a radio detective that learns to group friends together, push strangers apart, and has a special trick to instantly spot anyone who doesn't belong, all without needing to see a single pixel of the drone.

Here is a detailed technical summary of the paper "Multi-Domain Supervised Contrastive Learning for UAV Radio-Frequency Open-Set Recognition."

1. Problem Statement

The rapid proliferation of Unmanned Aerial Vehicles (UAVs) in Low-Altitude Integrated Sensing and Communication (LA-ISAC) networks has created significant security challenges, particularly regarding unauthorized flights and non-cooperative UAVs.

Limitations of Existing Methods: Traditional RF-based recognition often relies on closed-set assumptions, where the model is trained only on known UAV types. In real-world scenarios, malicious or modified UAVs often appear that were not present in the training dataset. Closed-set models tend to misclassify these "unknown" UAVs as known classes.
Technical Challenges:
- Signal Characteristics: UAV RF signals are non-stationary and often employ frequency hopping, making feature extraction difficult for standard Convolutional Neural Networks (CNNs).
- Performance Trade-off: Existing open-set recognition methods often sacrifice the accuracy of known classes (closed-set performance) to improve the detection of unknown classes, leading to an unbalanced system.
- Feature Imbalance: Multi-domain features (e.g., texture vs. time-frequency position) often suffer from unbalanced optimization when using standard loss functions like Cross-Entropy.

2. Methodology

The authors propose a framework called Open-RFNet, built upon a Multi-Domain Supervised Contrastive Learning (MD-SupContrast) architecture and an Improved Generative OpenMax (IG-OpenMax) algorithm.

A. Data Preprocessing

Signal Modeling: The UAV-to-base station link is modeled as an Air-to-Ground Line-of-Sight (LoS) channel with path loss and wind-induced wobbling (modeled as Gaussian noise).
Denoising & Slicing: Raw I/Q signals are sliced, and sub-slices with low signal strength (pure noise) are filtered out.
Time-Frequency Transformation: The Short-Time Fourier Transform (STFT) converts the denoised I/Q signals into 2D time-frequency spectrograms. These are normalized to ensure fairness across different signal power levels.

B. Feature Extraction (MD-SupContrast)

The model fuses two distinct types of features to handle signal complexity:

Texture Features (Local): Extracted using ResNet-18. These capture geometric features of the spectrogram (e.g., rectangle aspect ratios, edge sharpness, internal amplitude distribution). This helps mitigate the impact of non-stationary signals and frequency hopping.
Time-Frequency Position Features (Global): Extracted using a Transformer Encoder (TE).
- The model uses Multi-Nonlinear Layers (MNLs) and Position Encoding to distill time-domain and frequency-domain positional information.
- The self-attention mechanism in the TE captures long-range dependencies and global profile features of the UAV types.
Feature Fusion: The texture features ( $\tilde{z}_a$ ), time-domain position features ( $\tilde{z}_b$ ), and frequency-domain position features ( $\tilde{z}_c$ ) are concatenated and processed through MNLs to form a fused representation ( $\tilde{z}$ ).

C. Supervised Contrastive Learning

Instead of standard Cross-Entropy loss, the authors employ Supervised Contrastive Learning (SupCon).

Mechanism: It pulls samples of the same class closer together and pushes different classes apart in the feature space.
Benefit: This optimizes the feature representation at the feature level rather than just the output probability, preventing the model from over-optimizing one dominant feature (e.g., texture) while ignoring others (e.g., position). It enhances robustness against noisy labels and similar signal distributions.

D. Open-Set Recognition: IG-OpenMax

To detect unknown UAVs without retraining the entire network (which would shift the feature space), the authors propose a two-stage IG-OpenMax algorithm:

Generative Simulation: A conditional Deep Convolutional GAN (cDCGAN) trained on WGAN-GP generates synthetic samples for known classes.
Misclassification as Unknown: These synthetic samples are fed into the trained closed-set model (Open-RFNet-C). Samples that are misclassified by the model are treated as proxies for "unknown" real-world data.
Freeze-and-Retrain Strategy:
- The feature extraction layers are frozen.
- Only the classification head is retrained using the original training data plus the "misclassified" synthetic samples.
- Rationale: This preserves the original feature space distribution, ensuring the generated samples remain valid approximations of the boundary regions where real unknowns exist.
Weibull Calibration: An OpenMax layer fits a Weibull distribution to the activation vectors of known classes. It adjusts the prediction scores to estimate the probability of a sample belonging to an "unknown" class based on extreme value theory.

3. Key Contributions

MD-SupContrast Framework: A novel architecture that fuses ResNet-based texture features and Transformer-based time-frequency position features, optimized via supervised contrastive learning to handle non-stationary RF signals and balance multi-domain feature learning.
IG-OpenMax Algorithm: An improved generative open-set recognition method that freezes the feature extractor during the second training stage. This prevents feature space drift and allows the model to learn decision boundaries for unknown classes without degrading closed-set accuracy.
Comprehensive Evaluation: The use of a large-scale, real-world UAV RF dataset (DroneRFa) covering 25 UAV types across multiple frequency bands (915 MHz, 2.4 GHz, 5.8 GHz).

4. Experimental Results

The proposed Open-RFNet was evaluated against state-of-the-art benchmarks (OpenMax, G-OpenMax, S3R, UIOS) on the DroneRFa dataset.

Performance Metrics:
- Closed-Set Accuracy (KAR): 95.12% (25 UAV types).
- Open-Set Accuracy (UAR): 96.08% (for unknown classes).
- Performance Gap (GAP): Only 0.96% difference between closed-set and open-set performance, demonstrating excellent balance.
Comparisons:
- Outperformed S3R (which had high UAR but low KAR of 90.86%) and UIOS.
- Outperformed G-OpenMax by improving UAR by ~2.96% while maintaining KAR.
- Ablation Studies: Confirmed that using both the Transformer module and Supervised Contrastive Learning is necessary; using either alone resulted in performance degradation compared to the full model.
- Denoising: The proposed denoising preprocessing improved UAR from ~66% (noisy) to ~96% (denoised).
Efficiency: The system operates with an end-to-end latency of approximately 54.21 ms (including inference), meeting real-time requirements for UAV surveillance.

5. Significance

This work addresses a critical gap in UAV security by enabling reliable recognition of non-cooperative and unknown UAVs without sacrificing the accuracy of known threats.

Practical Impact: The method is suitable for 5G-Advanced (5G-A) LA-ISAC networks, providing a passive, robust, and real-time solution for anti-UAV defense.
Theoretical Advancement: It demonstrates that freezing feature extractors during open-set adaptation preserves the discriminative power of the learned features, a finding that could apply to other open-set recognition tasks in signal processing.
Robustness: By combining local texture analysis with global positional attention and contrastive learning, the system effectively handles the inherent variability and noise of real-world RF signals.