ACCOR: Attention-Enhanced Complex-Valued Contrastive Learning for Occluded Object Classification Using mmWave Radar IQ Signals

Imagine you are in a warehouse, and a robot needs to sort packages on a conveyor belt. The problem? The packages are sealed in cardboard boxes. A regular camera is useless here because it can't see through the cardboard. A human inspector would have to stop the line and open every box, which is slow and expensive.

This is where mmWave radar comes in. Think of it as a "super-sonic flashlight" that uses invisible radio waves. Unlike light, these waves can pass right through cardboard, plastic, and fabric, bouncing off the object inside to create a "ghost image" of what's hidden.

However, there's a catch: The raw data these radars produce isn't a nice, clear picture like a photo. It's a chaotic, mathematical soup of numbers (called IQ signals) that represent both the strength and the timing of the waves. It's like trying to identify a person just by listening to the echo of their voice in a cave, without seeing them.

The paper introduces a new AI system called ACCOR that acts as a "super-listener" to solve this problem. Here is how it works, broken down into simple concepts:

1. The "Complex" Ear (Complex-Valued CNN)

Most AI models are like people who only listen to the volume of a sound (amplitude) but ignore the pitch or timing (phase). If you try to teach a model to recognize a hidden object by only looking at the volume, you lose half the story.

The authors realized that radar signals are naturally "complex" (they have two parts: Real and Imaginary, like coordinates on a map).

The Analogy: Imagine trying to identify a song by only listening to the loudness of the drums, ignoring the melody. You'd never know if it's a rock song or a jazz song.
The Solution: ACCOR uses a special type of AI brain (a Complex-Valued CNN) that listens to both the volume and the timing simultaneously. It doesn't chop the signal in half; it keeps the full, rich "song" intact, allowing it to hear the subtle differences between a hammer and a water bottle inside a box.

2. The "Focus" Mechanism (Attention)

Even with a good ear, the radar signal is noisy. It's like trying to hear a specific conversation in a crowded, noisy party. The AI might get distracted by the echo of the box itself or the background noise.

The Analogy: Imagine wearing noise-canceling headphones that can magically isolate just the voice of the person you are talking to, ignoring everyone else.
The Solution: ACCOR uses an Attention Layer. This is like a spotlight that tells the AI, "Ignore the background noise; focus only on the specific part of the signal that tells us what the object is." It helps the model zero in on the most important clues.

3. The "Strict Coach" (Hybrid Loss Function)

Training an AI is like teaching a student. Usually, you just tell them, "Right or Wrong?" (Cross-Entropy). But with radar, different objects (like a plastic cup and a metal cup) might look very similar to the AI, making it easy to get confused.

The Analogy: Imagine a teacher who not only grades the student's test but also forces them to group similar items together and push different items apart in their mind.
The Solution: The authors created a Hybrid Loss. It's a two-part grading system:
1. The Test: Did you get the label right?
2. The Grouping: Did you learn to keep "hammers" far away from "screwdrivers" in your mental map?
  This "Strict Coach" forces the AI to create very distinct mental categories, so it never mixes up a ball with a tape roll.

4. The "Double-Check" (Two Frequencies)

The researchers didn't just test their system once. They tested it with two slightly different radio frequencies (64 GHz and 67 GHz).

The Analogy: It's like checking a suspect's ID with two different flashlights. If the ID looks clear under both lights, you can be sure it's real.
The Result: They found that while the two frequencies are very close (like two shades of blue), the system works incredibly well on both. It proved that their method is robust and doesn't rely on a lucky guess with just one specific setting.

The Bottom Line

The result? ACCOR is a master detective.

It correctly identified hidden objects 96.6% of the time at one frequency and 93.6% at the other.
It beat all previous radar models and even models that were originally designed for regular photos (which fail miserably when you try to feed them radar data).

Why does this matter?
This technology could revolutionize warehouses, factories, and even security. Imagine robots that can sort packages without opening them, or security scanners that can see through walls to find hidden tools or weapons, all without needing expensive, bulky equipment. It turns a "blurry echo" into a clear, confident answer.

Here is a detailed technical summary of the paper "ACCOR: Attention-Enhanced Complex-Valued Contrastive Learning for Occluded Object Classification Using mmWave Radar IQ Signals."

1. Problem Statement

The paper addresses the challenge of occluded object classification in industrial and robotic settings. Specifically, it focuses on identifying objects inside sealed packaging (e.g., cardboard boxes) without visual access.

Limitations of Optical Sensors: Cameras and LiDAR fail in adverse conditions (fog, smoke, darkness) and cannot penetrate non-metallic materials like cardboard.
Limitations of Existing Radar Approaches:
- Many rely on large-scale antenna arrays or synthetic aperture radar (SAR) scanning, which are bulky and not scalable for compact industrial automation.
- Current deep learning models often process pre-processed data (e.g., Range-Doppler maps or point clouds), which discards valuable phase and amplitude information inherent in raw Complex-Valued IQ (In-phase/Quadrature) signals.
- Existing models often treat radar data as real-valued images, losing the rotational invariance and magnitude-phase coupling crucial for radar signals.
- There is a lack of systematic evaluation across different frequency bands for IQ signal-based classification.

2. Methodology: The ACCOR Framework

The authors propose ACCOR, a deep learning framework designed to process raw complex-valued radar IQ signals directly. The architecture consists of three main components:

A. Data Preprocessing

Sensor: A 62–69 GHz FMCW MIMO mmWave radar with 20 Tx and 20 Rx antennas (400 virtual channels).
Input: Raw IQ signals are transformed via Fast Fourier Transform (FFT) to generate complex range profiles.
Input Dimension: A single sample consists of 400 complex-valued signals, each with 100 range bins ( $s \in \mathbb{C}^{400 \times 100}$ ).

B. Model Architecture

Complex-Valued CNN Backbone:
- Unlike standard CNNs that split I/Q into real channels, ACCOR operates entirely in the complex domain to preserve phase relationships.
- It utilizes complex convolutions, complex batch normalization, and a complex ReLU activation function.
- Structure: Three complex convolutional layers (kernel size 5) followed by complex average pooling.
- Feature Projection: The complex features are projected into a real domain vector (concatenating real and imaginary parts) to facilitate the attention mechanism.
Multi-Head Self-Attention:
- A 16-head self-attention layer processes the feature tokens ( $D=256$ ).
- Purpose: To refine features by capturing dependencies across both the range and angle domains, allowing the model to focus on the most discriminative parts of the radar signature.
Hybrid Loss Function:
- The model is trained using a weighted combination of Cross-Entropy (CE) and Supervised Contrastive Loss.
- Formula: $\ell_{total} = (1 - \alpha)\ell_{CE} + \alpha\ell_{contrastive}$
- Rationale: Radar signals for different objects can be highly similar. The contrastive term forces the model to maximize the distance between different classes and minimize the distance between samples of the same class in the feature space, enhancing separability.

3. Key Contributions

Complex-Valued Architecture: Design of a compact CNN backbone that natively processes complex IQ signals, preserving amplitude and phase information that real-valued models discard.
Hybrid Loss Strategy: Introduction of a supervised contrastive loss combined with cross-entropy to improve class separability in the feature space, addressing the high similarity of radar signatures.
Frequency Band Expansion: Extension of an existing 64 GHz dataset with a new 67 GHz subset. This allows for the first comparative analysis of occluded object classification across these specific frequency bands using the same hardware and setup.
State-of-the-Art Performance: Demonstration that ACCOR outperforms both specialized radar models and adapted image classification models (like ResNet and EfficientNet) on this task.

4. Experimental Results

The model was evaluated on a dataset of 10 everyday objects (e.g., hammer, screwdriver, water bottle) inside sealed cardboard boxes.

Accuracy:
- 64 GHz: 96.60% accuracy.
- 67 GHz: 93.59% accuracy.
- Comparison: ACCOR significantly outperformed the best baseline radar model (Dual-stream CNN: 95.15% at 64 GHz) and all adapted image models (ResNet-18: 93.36% at 64 GHz).
Ablation Studies:
- Loss Weighting ( $\alpha$ ): The best performance was achieved with $\alpha = 0.4$ (64 GHz) and $\alpha = 0.5$ (67 GHz). Pure cross-entropy ( $\alpha=0$ ) resulted in lower accuracy (~94.5%), confirming the benefit of contrastive learning.
- Complex vs. Real: Replacing the complex backbone with a real-valued counterpart (splitting I/Q channels) caused a significant drop in accuracy (e.g., from 96.60% to 90.70% at 64 GHz), proving the necessity of complex-domain processing.
- Feature Space Visualization: t-SNE plots showed that the hybrid loss created much tighter clusters for the same class and better separation between different classes compared to cross-entropy alone.
Frequency Analysis: While 64 GHz generally yielded slightly higher accuracy, both bands provided sufficient information for >90% accuracy. The small wavelength difference (0.21 mm) meant penetration capabilities were similar, though feature extraction varied slightly between bands.

5. Significance and Conclusion

Industrial Impact: ACCOR demonstrates that compact, low-cost MIMO mmWave radars can effectively replace or augment optical sensors for non-visual inspection in logistics, inventory management, and automated sorting.
Methodological Advance: The paper validates that complex-valued deep learning combined with attention mechanisms and contrastive learning is superior to traditional real-valued or image-based approaches for raw radar signal processing.
Future Work: The authors note that while the current dataset is a proof-of-concept, future research should focus on larger datasets, more diverse object types, and wider frequency gaps to further refine penetration analysis and robustness.

In summary, ACCOR establishes a new benchmark for occluded object classification by leveraging the full information content of mmWave radar IQ signals through a specialized complex-valued neural network architecture.

ACCOR: Attention-Enhanced Complex-Valued Contrastive Learning for Occluded Object Classification Using mmWave Radar IQ Signals

1. The "Complex" Ear (Complex-Valued CNN)

2. The "Focus" Mechanism (Attention)

3. The "Strict Coach" (Hybrid Loss Function)

4. The "Double-Check" (Two Frequencies)

The Bottom Line

1. Problem Statement

2. Methodology: The ACCOR Framework

A. Data Preprocessing

B. Model Architecture

3. Key Contributions

4. Experimental Results

5. Significance and Conclusion

More like this

Frequency Response of Windowed DFT Phasor Estimation: Impact on Oscillation Observability

Rethinking Next-Generation Signal Waveform: Integration of Orthogonality and Non-Orthogonality

Activation Steering for Accent Adaptation in Speech Foundation Models

ImKWS: Test-Time Adaptation for Keyword Spotting with Class Imbalance

In-Wave Computation Aided Stacked Intelligent Metasurfaces in Next-Generation Networks: Challenges and Opportunities