Modeling and Measuring Redundancy in Multisource Multimodal Data for Autonomous Driving

Imagine you are teaching a robot to drive a car. To do this safely, you give the robot a bunch of "eyes" (cameras) and "feelers" (LiDAR sensors) all around the vehicle. The robot needs to see everything: cars, pedestrians, stop signs, and potholes.

This paper is about a problem that happens when you have too many eyes looking at the same thing.

The Problem: The "Echo Chamber" of Data

Think of a self-driving car like a person standing in the middle of a room with six friends, all shouting descriptions of a cat sitting on a chair.

Friend A says, "It's a cat!"
Friend B says, "It's a cat!"
Friend C says, "It's a cat!"
Friend D says, "It's a cat!"

If all six friends are looking at the cat from slightly different angles, they are all giving you the same information. In the world of self-driving cars, this is called redundancy.

The researchers found that while having multiple sensors is great for safety (if one fails, another works), having too much duplicate data actually makes the robot's brain (the AI model) slower and sometimes even confused. It's like trying to study for a test by reading the same chapter six times instead of reading six different chapters. You aren't learning anything new; you're just wasting time.

The Solution: The "Smart Editor"

The authors of this paper asked: "What if we could act like a smart editor? What if we could look at all these duplicate descriptions and say, 'Okay, we only need the clearest, most complete description of this cat. Let's throw away the blurry or repetitive ones.'"

They developed a method to measure this redundancy and then "prune" (cut out) the unnecessary data before training the AI.

Here is how they did it, using two simple analogies:

1. The "Best Photo" Rule (Multisource Data)

Imagine you take a photo of a dog with three different cameras.

Camera 1 gets a great shot, but the dog's tail is cut off by the edge of the picture.
Camera 2 gets a shot where the dog's head is cut off.
Camera 3 gets a perfect, full-body shot.

In the past, the AI would try to learn from all three photos, getting confused by the missing parts. The researchers created a "Completeness Score." They told the computer: "Look at all the photos of this dog. Keep the one where the dog is most fully visible. Throw away the ones where parts are missing."

The Result: When they trained the AI using only the "best" photos and threw away the partial/redundant ones, the AI actually got better at spotting objects. It learned faster and made fewer mistakes because it wasn't distracted by bad or duplicate examples.

2. The "Close vs. Far" Rule (Multimodal Data)

Now, imagine the car has both eyes (cameras) and sonar (LiDAR).

Close up: When a car is right in front of you, your eyes see it clearly, and your sonar bounces back a strong signal. You have two perfect descriptions of the same car. This is redundant.
Far away: When a car is far down the road, your eyes might struggle to see the details, but your sonar can still detect the shape. Here, the two sensors are helping each other, not repeating each other.

The researchers found that for objects very close to the car, the LiDAR data was often just repeating what the camera already saw perfectly. They decided to turn off the LiDAR for close-up objects and rely on the camera, saving the LiDAR for the distant objects where it's truly needed.

Why Does This Matter?

You might think, "But isn't more data always better?"
Not in this case. The paper shows that quality is better than quantity.

Speed: By removing the "echoes" (redundant data), the computer has less work to do. It can make decisions faster, which is critical for avoiding accidents.
Smarts: The AI stops getting confused by conflicting or repetitive signals. It learns the "truth" about the road more efficiently.
Cost: Less data means you need less storage and less computing power, making self-driving cars cheaper to build and run.

The Bottom Line

This research is like telling a self-driving car: "Stop trying to memorize every single angle of every single car. Just pick the best view, ignore the duplicates, and focus on the things you can't see clearly."

By being smarter about which data to use, rather than just using all the data, we can build safer, faster, and more efficient autonomous vehicles. The researchers proved that sometimes, less is actually more.

1. Problem Statement

Next-generation Autonomous Vehicles (AVs) rely on massive volumes of Multisource Multimodal (M²) data (e.g., multiple cameras, LiDAR, RADAR) for real-time decision-making. While AV research has traditionally prioritized algorithmic design and model architecture, it has largely neglected Data Quality (DQ) analysis.

Specifically, redundancy—the presence of duplicate or highly overlapping information across sensors—is a fundamental but underexplored DQ issue.

The Issue: Redundancy increases computational costs, storage requirements, and training time without necessarily improving performance. In some cases, inconsistent redundant predictions introduce noise, degrading localization and confidence.
The Gap: There is a lack of systematic frameworks to define, model, and quantify redundancy in M² data specifically for AV tasks like object detection. Current approaches often treat all data as equally valuable, failing to distinguish between high-quality complementary data and low-value duplicate data.

2. Methodology

The authors propose a research framework to model, measure, and prune redundancy in object detection tasks using YOLOv8 as the baseline detector. The methodology addresses three Research Questions (RQs) regarding definition, measurement/selection, and performance impact.

A. Redundancy Modeling

The study distinguishes between two types of redundancy:

Multisource Redundancy (Camera-Camera): Arises from overlapping Fields of View (FoV) among multiple onboard cameras.
Multimodal Redundancy (Camera-LiDAR): Arises when different sensor modalities observe the same objects simultaneously.

B. Measurement and Pruning Strategies

The paper introduces specific metrics and pruning rules for each data type:

Bounding Box Completeness Score (BCS):
- Purpose: To evaluate the quality of an object's representation in a specific camera view.
- Definition: $BCS(b) = \frac{\text{Area}(BBox_{clipped})}{\text{Area}(BBox_{full})}$ . It measures how much of the object's bounding box is visible within the image frame.
- Pruning Rule: For overlapping camera pairs, if the difference in BCS between two views exceeds a threshold ( $\tau_{BCS}$ ), the view with the lower BCS (less complete) is discarded. This retains the most informative observation while removing duplicates.
Distance-Aware Pruning (Multimodal):
- Purpose: To address redundancy between LiDAR and cameras.
- Logic: Objects close to the ego-vehicle are often detected with high confidence by both LiDAR and cameras.
- Pruning Rule: The 3D centroid of LiDAR detections is calculated. If the distance from the ego-vehicle is below a threshold ( $T_{dist}$ ), the LiDAR box is considered redundant (since the camera view is likely sufficient) and is pruned.

C. Experimental Setup

Datasets: nuScenes (specifically nuScenes-mini and nuScenes-in-KITTI) and Argoverse 2 (AV2). These were chosen for their dense annotations, multi-sensor setups, and public calibration files.
Task: 2D and 3D Object Detection using YOLOv8.
Evaluation: The models were trained on datasets with varying levels of redundancy (controlled by $\tau_{BCS}$ and $T_{dist}$ ) and evaluated against an unpruned baseline using mAP50 (mean Average Precision at 50% IoU), Precision, and Recall.

3. Key Contributions

First Systematic Modeling of M² Redundancy: The paper is the first to explicitly model and measure redundancy in both multisource (camera-camera) and multimodal (camera-LiDAR) contexts for AV object detection.
Task-Driven Data Selection: Proposed a novel BCS-guided pruning strategy that selects the optimal data subset based on bounding box completeness and spatial overlap, rather than random sampling.
Generalizability: Validated the approach across two distinct, large-scale benchmarks (nuScenes and AV2), demonstrating that the method is not dataset-specific but generalizable.
Empirical Evidence on Multimodal Redundancy: Revealed that significant redundancy exists between image and LiDAR data, particularly for close-range objects, providing a basis for efficiency improvements.

4. Key Results

Multisource (Camera-Camera) Results

nuScenes: Selectively removing redundant labels improved detection performance.
- mAP50 gains were observed in representative overlap regions:
  - Pair 1: 0.66 $\to$ 0.70
  - Pair 2: 0.64 $\to$ 0.67
  - Pair 3: 0.53 $\to$ 0.55
- Other pairs maintained baseline performance even under stronger pruning.
Argoverse 2 (AV2):
- Pruning removed 4.1% to 8.6% of the total labels (approx. 9,442 labels at $\tau_{BCS}=0.5$ ).
- Despite removing these labels, mAP50 remained near the baseline (0.64), with precision actually slightly increasing (0.818 vs. 0.815).
- Conclusion: Redundant multisource supervision provides diminishing returns; removing low-quality duplicates improves data efficiency without sacrificing accuracy.

Multimodal (Camera-LiDAR) Results

Distance Correlation: Statistical analysis (T-test) confirmed that high cross-modal redundancy is strongly correlated with objects close to the ego-vehicle.
Efficiency: Removing close-range LiDAR data (where visual coverage is strong) had negligible impact on detection performance but significantly reduced the number of data points to process, improving computational efficiency.

5. Significance and Implications

Data-Centric AI: The work shifts the focus from purely "model-centric" improvements to "data-centric" optimization. It demonstrates that cleaning data (removing redundancy) can yield performance gains comparable to or better than architectural changes.
Cost Reduction: By identifying and removing redundant labels, AV developers can reduce dataset storage requirements and training time, accelerating the development cycle.
Robustness vs. Efficiency: The study provides a framework to balance robustness (keeping necessary redundancy for safety) with efficiency (removing noise). It suggests that AV systems do not need all sensor data at all times to function optimally.
Future Directions: The authors propose extending this work to other modalities (RADAR), diverse driving environments (weather/lighting conditions), and other AV tasks like prediction and planning, where data quality requirements may differ.

In summary, this paper establishes redundancy as a measurable and actionable Data Quality factor, proving that selective pruning of redundant M² data enhances object detection performance while improving computational efficiency.