Occlusion-Aware Multimodal Beam Prediction and Pose Estimation for mmWave V2I

This paper proposes an occlusion-aware, Transformer-based multimodal learning framework that fuses visual, LiDAR, radar, GNSS, and mmWave data to jointly predict beam indices, blockage probabilities, and vehicle poses for robust 6G V2I communication under dynamic blockage, achieving high accuracy on the DeepSense 6G dataset.

Abidemi Orimogunje, Hyunwoo Park, Kyeong-Ju Cha, Igbafe Orikumhi, Sunwoo Kim, Dejan Vukobratovic

Published 2026-03-30
📖 4 min read☕ Coffee break read

Imagine you are driving a self-driving car in a busy city. You need to do two things at the exact same time:

  1. Find your way (know exactly where you are on the map).
  2. Keep a super-fast internet connection (so you can talk to traffic lights, other cars, and the cloud).

The problem? Millimeter-wave (mmWave) internet is like a super-bright flashlight. It's incredibly fast, but if a truck, a pedestrian, or even a tree blocks the light, the connection dies instantly. Traditional systems try to "guess" where to point the flashlight by constantly scanning the air, which is slow and wastes energy.

This paper proposes a smarter solution: A "Super-Sense" Brain for the Car.

Here is the breakdown of how it works, using simple analogies:

1. The Problem: The "Blind" Radio

Imagine trying to find a friend in a crowded, foggy stadium by only shouting their name and listening for a reply. If the crowd is too loud or someone blocks your view, you might shout in the wrong direction.

  • The Old Way: The car's radio system blindly scans 64 different directions (beams) to find the best signal. This is slow and wastes battery.
  • The Risk: If a bus suddenly pulls in front of the car, the radio doesn't know until the connection already broke.

2. The Solution: The "Multimodal" Detective

The authors built an AI system that acts like a detective with five different senses working together, inspired by how humans navigate (SLAM - Simultaneous Localization and Mapping).

Instead of just listening to the radio, the system looks at:

  • 👁️ Eyes (RGB Camera): It sees the street, cars, and buildings.
  • 📏 3D Ruler (LiDAR): It creates a precise 3D map of the surroundings, like a laser scanner.
  • 📡 Radar: It sees through fog and rain to detect moving objects.
  • 🌍 GPS: It knows the general neighborhood.
  • 📶 Radio Memory: It remembers what the signal strength was just a second ago.

The Magic Ingredient: The system uses a Transformer (the same AI tech behind chatbots) to mix all these senses together. It's like a conductor in an orchestra, making sure the eyes, ears, and memory are playing the same song.

3. What Does It Actually Do?

The AI predicts three things simultaneously:

  1. Where to point the flashlight: Instead of scanning 64 directions, it instantly guesses the one best direction to point the antenna to get the fastest internet.
  2. Is the path blocked? It predicts if a truck is about to block the signal before the signal actually drops.
  3. Where am I? It calculates the car's exact position on the street with high precision.

4. The Results: Winning the Race

The team tested this on a real-world dataset (DeepSense 6G) that simulates a busy city street. Here is how their "Super-Sense" brain compared to using just one sense:

  • The "Camera-Only" Driver: Good at seeing, but sometimes gets confused by shadows or bad lighting. It got about 50% of the beam directions right.
  • The "Radio-Only" Driver: Terrible at guessing without looking. It only got 6% right.
  • The "Super-Sense" Driver (This Paper): By combining all senses, it got 51% of the beam directions right (beating the camera alone) and was much better at spotting blockages.

Why does this matter?

  • Speed: It keeps the internet connection stable, losing almost no speed (only 0.018 bits/s/Hz loss, which is practically nothing).
  • Safety: It knows where the car is within 1.33 meters (about 4 feet), which is much better than using just a camera (2.10 meters).
  • Efficiency: It doesn't need to waste time scanning 64 directions; it just points the flashlight where it needs to go.

The Bottom Line

Think of this technology as giving the car's internet connection eyes and a memory. Instead of blindly shouting into the void, the car looks at the road, remembers what happened a second ago, and instantly points its antenna in the perfect direction to keep the connection alive, even when the city gets chaotic.

This is a big step toward 6G, where your car won't just drive itself; it will stay perfectly connected to the world around it, no matter how many obstacles are in the way.